Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stephen Pickles UKLight Town Meeting, NeSC, Edinburgh, 9/9/2004 TeraGyroid HPC Applications.

Similar presentations


Presentation on theme: "Stephen Pickles UKLight Town Meeting, NeSC, Edinburgh, 9/9/2004 TeraGyroid HPC Applications."— Presentation transcript:

1 Stephen Pickles UKLight Town Meeting, NeSC, Edinburgh, 9/9/2004 TeraGyroid HPC Applications ready for UKLight

2 UKLight Town Meeting, NeSC, 9/9/20042 The TeraGyroid Project  Funded by EPSRC (UK) & NSF (USA) to join the UK e- Science Grid and US TeraGrid –application from RealityGrid, a UK e-Science Pilot Project –3 month project including work exhibited at SC’03 and SC Global, Nov 2003 –thumbs up from TeraGrid mid-September, funding from EPSRC approved later  Main objective was to deliver high impact science which it would not be possible to perform without the combined resources of the US and UK grids  Study of defect dynamics in liquid crystalline surfactant systems using lattice-Boltzmann methods –featured world’s largest Lattice Boltzmann simulation –1024^3 cell simulation of gyroid phase demands terascale computing hence “TeraGyroid”

3 UKLight Town Meeting, NeSC, 9/9/20043 Networking visualization engine storage HPC engine checkpoint files visualization data compressed video steering: control and status

4 UKLight Town Meeting, NeSC, 9/9/20044 LB3D: 3-dimensional Lattice-Boltzmann simulations  LB3D code is written in Fortran90 and parallelized using MPI  Scales linearly on all available resources (Lemieux, HPCx, CSAR, Linux/Itanium II clusters)  Data produced during a single run can exceed 100s of gigabytes to terabytes  Simulations require supercomputers  High end visualization hardware (eg. SGI Onyx, dedicated viz clusters) and parallel rendering software (e.g. VTK) needed for data analysis 3D datasets showing snapshots from a simulation of spinodal decomposition: A binary mixture of water and oil phase separates. ‘Blue’ areas denote high water densities and ‘red’ visualizes the interface between both fluids.

5 UKLight Town Meeting, NeSC, 9/9/20045 Computational Steering of Lattice Boltzmann Simulations  LB3D instrumented for steering using the RealityGrid steering library.  Malleable checkpoint/restart functionality allows ‘rewinding’ of simulations and run- time job migration across architectures.  Steering reduces storage requirements because the user can adapt data dumping frequencies.  CPU time can be saved because users do not have to wait for jobs to be finished if they can already see that nothing relevant is happening.  Instead of doing “task farming”, parameter searches are accelerated by “steering” through parameter space.  Analysis time is significantly reduced because less irrelevant data is produced. Applied to study of gyroid mesophase of amphiphilic liquid crystals at unprecedented space and time scales

6 UKLight Town Meeting, NeSC, 9/9/20046 Parameter space exploration Initial condition: Random water/ surfactant mixture. Self-assembly starts. Rewind and restart from checkpoint. Lamellar phase: surfactant bilayers between water layers. Cubic micellar phase, low surfactant density gradient. Cubic micellar phase, high surfactant density gradient.

7 UKLight Town Meeting, NeSC, 9/9/20047 Strategy  Aim: use federated resources of US TeraGrid and UK e-Science Grid to accelerate scientific process  Rapidly map out parameter space using large number of independent “small” (128^3) simulations –use job cloning and migration to exploit available resources and save equilibration time  Monitor their behaviour using on-line visualization  Hence identify parameters for high-resolution simulations on HPCx and Lemieux –1024^3 on Lemieux (PSC) – takes 0.5 TB to checkpoint! –create initial conditions by stacking smaller simulations with periodic boundary conditions  Selected 128^3 simulations were used for long-time studies  All simulations monitored and steered by geographically distributed team of computational scientists

8 UKLight Town Meeting, NeSC, 9/9/20048 The Architecture of Steering Steering client Simulation Steering library Visualization Registry Steering GS connect publish find bind data transfer (Globus-IO) publish bind Client Steering library Display components start independently and attach/detach dynamically remote visualization through SGI VizServer, Chromium, and/or streamed to Access Grid multiple clients: Qt/C++,.NET on PocketPC, GridSphere Portlet (Java) OGSI middle tier Computations run at HPCx, CSAR, SDSC, PSC and NCSA Visualizations run at Manchester, UCL, Argonne, NCSA, Phoenix Scientists in 4 sites steer calculations, collaborating via Access Grid Visualizations viewed remotely Grid services run anywhere

9 UKLight Town Meeting, NeSC, 9/9/20049 SC Global ’03 Demonstration

10 UKLight Town Meeting, NeSC, 9/9/ TeraGyroid Testbed Visualization Computation Starlight (Chicago) Netherlight (Amsterdam) BT provision PSC ANL NCSA Phoenix Caltech SDSC UCL Daresbury Manchester SJ4 MB-NG Network PoP Access Grid node Service Registry production network Dual-homed system 10 Gbps 2 x 1 Gbps

11 UKLight Town Meeting, NeSC, 9/9/ Trans-Atlantic Network Collaborators:  Manchester Computing  Daresbury Laboratory Networking Group  MB-NG and UKERNA  UCL Computing Service  BT  SurfNET (NL)  Starlight (US)  Internet-2 (US)

12 UKLight Town Meeting, NeSC, 9/9/ TeraGyroid: Hardware Infrastructure Computation (using more than 6000 processors) including:  HPCx (Daresbury), 1280 procs IBM Power4 Regatta, 6.6 Tflops peak, TB  Lemieux (PSC), 3000 procs HP/Compaq, 3TB memory, 6 Tflops peak  TeraGrid Itanium2 cluster (NCSA), 256 procs, 1.3 Tflops peak  TeraGrid Itanium2 cluster (SDSC), 256 procs, 1.3 Tflops peak  Green (CSAR), SGI Origin 3800, 512 procs, TB memory (shared)  Newton (CSAR), SGI Altix 3700, 256 Itanium 2 procs, 384GB memory (shared) Visualization:  Bezier (Manchester), SGI Onyx 300, 6xIR3, 32procs  Dirac (UCL), SGI Onyx 2, 2xIR3, 16 procs  SGI loan machine, Phoenix, SGI Onyx 1xIR4, 1xIR3, commissioned on site  TeraGrid Visualization Cluster (ANL), Intel Xeon  SGI Onyx (NCSA) Service Registry:  Frik (Manchester), Sony Playstation2 Storage:  20 TB of science data generated in project  2 TB moved to long term storage for on-going analysis - Atlas Petabyte Storage System (RAL) Access Grid nodes at Boston University, UCL, Manchester, Martlesham, Phoenix (4)

13 UKLight Town Meeting, NeSC, 9/9/ Network lessons  Less than three weeks to debug networks –applications people and network people nodded wisely but didn’t understand each other –middleware such as GridFTP is infrastructure to applications folk, but an application to network folk –rapprochement necessary for success  Grid middleware not designed with dual-homed systems in mind –HPCx, CSAR (Green) and Bezier are busy production systems –had to be dual homed on SJ4 and MB-NG –great care with routing –complication: we needed to drive everything from laptops that couldn’t see the MB- NG network  Many other problems encountered –but nothing that can’t be fixed once and for all given persistent infrastructure

14 UKLight Town Meeting, NeSC, 9/9/ Measured Transatlantic Bandwidths during SC’03

15 UKLight Town Meeting, NeSC, 9/9/ TeraGyroid: Summary  Real computational science... –Gyroid mesophase of amphiphilic liquid crystals –Unprecedented space and time scales –investigating phenomena previously out of reach ...on real Grids... –enabled by high-bandwidth networks ...to reduce time to insight Interfacial Surfactant Density Dislocations

16 UKLight Town Meeting, NeSC, 9/9/ TeraGyroid: Collaborating Organisations Our thanks to hundreds of individuals at:... Argonne National Laboratory (ANL) Boston University BT BT Exact Caltech CSC Computing Services for Academic Research (CSAR) CCLRC Daresbury Laboratory Department of Trade and Industry (DTI) Edinburgh Parallel Computing Centre Engineering and Physical Sciences Research Council (EPSRC) Forschungzentrum Juelich HLRS (Stuttgart) HPCx IBM Imperial College London National Center for Supercomputer Applications (NCSA) Pittsburgh Supercomputer Center San Diego Supercomputer Center SCinet SGI SURFnet TeraGrid Tufts University, Boston UKERNA UK Grid Support Centre University College London University of Edinburgh University of Manchester ANL

17 The TeraGyroid Experiment S. M. Pickles 1, R. J. Blake 2, B. M. Boghosian 3, J. M. Brooke 1, J. Chin 4, P. E. L. Clarke 5, P. V. Coveney 4, N. González-Segredo 4, R. Haines 1, J. Harting 4, M. Harvey 4, M. A. S. Jones 1, M. Mc Keown 1, R. L. Pinning 1, A. R. Porter 1, K. Roy 1, and M. Riding 1. 1.Manchester Computing, University of Manchester 2.CLRC Daresbury Laboratory, Daresbury 3.Tufts University, Massachusetts 4.Centre for Computational Science, University College London 5.Department of Physics & Astronomy, University College London

18 New Application at AHM2004 Philip Fowler, Peter Coveney, Shantenu Jha and Shunzhou Wan UK e-Science All Hands Meeting 31 August – 3 September 2004 “Exact” calculation of peptide-protein binding energies by steered thermodynamic integration using high-performance computing grids.

19 UKLight Town Meeting, NeSC, 9/9/ Why are we studying this system?  Measuring binding energies are vital for e.g. designing new drugs.  Calculating a peptide-protein binding energy can take weeks to months.  We have developed a grid-based method to accelerate this process To compute  G bind during the AHM 2004 conference i.e. in less than 48 hours Using federated resources of UK National Grid Service and US TeraGrid

20 UKLight Town Meeting, NeSC, 9/9/ lambda =0.1 =0.2 =0.3 … =0.9 Starting conformation t Seed successive simulations (10 sims, each 2ns) Check for convergence Combine and calculate integral time Use steering to launch, spawn and terminate - jobs Run each independent job on the Grid Thermodynamic Integration on Computational Grids

21 UKLight Town Meeting, NeSC, 9/9/ monitoring checkpointing steering and control

22 UKLight Town Meeting, NeSC, 9/9/ We successfully ran many simulations…  This is the first time we have completed an entire calculation. –Insight gained will help us improve the throughput.  The simulations were started at 5pm on Tuesday and the data was collated at 10am Thursday.  26 simulations were run  At 4.30pm on Wednesday, we had nine simulations in progress (140 processors) –1x TG-SDSC, 3x TG-NCSA, 3x NGS-Oxford, 1x NGS-Leeds, 1x NGS-RAL  We simulated over 6.8ns of classical molecular dynamics in this time

23 UKLight Town Meeting, NeSC, 9/9/ Very preliminary results We expect our value to improve with further analysis around the endpoints.  G (kcal/mol) Experiment-1.0 ± 0.3 “Quick and dirty” analysis*-9 to -12 * - as at 41 hours

24 UKLight Town Meeting, NeSC, 9/9/ Conclusions  We can harness today’s grids to accelerate high-end computational science  On-line visualization and job migration require high bandwidth networks  Need persistent network infrastructure –else set up costs are too high  QoS: Would like ability to reserve bandwidth –and processors, graphics pipes, AG rooms, virtual venues, nodops... (but that’s another story)  Hence our interest in UKLight


Download ppt "Stephen Pickles UKLight Town Meeting, NeSC, Edinburgh, 9/9/2004 TeraGyroid HPC Applications."

Similar presentations


Ads by Google