Presentation on theme: "A Grid approach to Environmental Molecular Simulations: Deployment and use of Condor within the eMinerals Mini Grid. Paul Wilson 1, Mark Calleja 2, John."— Presentation transcript:
A Grid approach to Environmental Molecular Simulations: Deployment and use of Condor within the eMinerals Mini Grid. Paul Wilson 1, Mark Calleja 2, John Brodholt 1, Martin Dove 2, Maria Alfreddson 1, Zhimei Du 3, Nora H. de Leeuw 3, Arnaud Marmier 4 and Rik Tyer Department of Earth Sciences, University College London, Gower Street, London WC1E 6BT, UK 2. Department of Earth Sciences, University of Cambridge, Downing street, Cambridge, CB2 3EQ, UK 3. Birkbeck College, University of London, Malet Street, London WC1E 7HX UK 4. Department of Chemistry, University of Bath, Bath, BA2 7AY, UK 5. Daresbury Laboratory, Daresbury, Cheshire, WA4 4AD, UK Environment from the Molecular Level A NERC eScience testbed project
This talk: Part 1 1. The eMinerals problem area 2. The Computational job-types this generates 3. How Condor can help to sort these jobs out 4. What we gain from Condor and where to go next 5. UK Institutional Condor programmes and the road ahead. This talk: Part 2 1. Condor’s additional features and how we use them. 2. The eMinerals mini grid. 3. Conclusion. Environment from the Molecular Level A NERC eScience testbed project
THE PROBLEM AREA. 1. Simulation of pollutants in the environment Binding of heavy metals and organic molecules in soils. 2. Studies of materials for long-term nuclear waste encapsulation Radiocactive waste leaching through ceramic storage media. 3. Studies of weathering and scaling Mineral/water interface simulations, e.g oil well scaling. Codes relying on empirical descriptions of interatomic forces: DL-POLY - molecular dynamics simulations GULP – lattice energy/lattice dynamics simulations METADISE – interface simulations Codes using a quantum mechanical description of interactions between atoms: CRYSTAL – Hartree-Fock implementation. SIESTA – Density Function Theory, numerical basis sets to describe electronic wave function. ABINIT - DFT, plane wave descriptions of electronic wave functions Environment from the Molecular Level A NERC eScience testbed project WHAT TYPE OF JOBS WILL THESE PROBLEMS BE MANIFESTED AS?
2 TYPES OF JOB: 1) High to mid performance: Requiring powerful resources, potential process intercommunication, long execution times, CPU and memory intensive. 2) Low performance/high throughput: Requiring access to many hundreds or thousands of PC-level CPU’s. No process intercommunication, short execution times, low memory usage. WHERE CAN WE GET THE POWER? TYPE 1 JOB: Masses of UK HPC resources around- it seems that UK grid resources are largely HPC! TYPE 2 JOB: ???????? Environment from the Molecular Level A NERC eScience testbed project THERE HAS GOT TO BE A BETTER WAY TO OPTIMISE TYPE 2 JOBS!
…AND THERE IS: WE USE WHAT’S ALREADY THERE: 930 win2K PC’s (1GHz P3, 256/512Mb Ram, 1Gbit e-net.) clustered in 30 student cluster rooms across every department on the UCL campus, with the potential to scale up to ~3000 PC’s. These machines waste 95% of their CPU cycles 24/7: A MASSIVE UNTAPPED RESOURCE- A COUP FOR eMINERALS! This is where Condor enters the scene. THE ONLY AVAILABLE OF-THE-SHELF RESOURCE MANAGEMENT AND JOB BROKER FOR WINDOWS: Install Condor on our clusters, and we harness 95% of the power of 930+ machines 24 hours a day, without spending any money. Environment from the Molecular Level A NERC eScience testbed project Is it really this simple?
YES! It has surpassed all expectations, with diverse current use and ever-rising demand smiley happy people ( our current group of users, and increasing monthly.): eMinerals project, eMaterials project, UCL Computer Science, UCL medical school, University of Marburg, Universities of Bath and Cambridge, Birkbeck College, The Royal Institution… - Over 900,000 hours of work completed in 6 months (105 CPU-years equivalent and counting) - Codes migrated to Windows representing huge variety: environmental molecular work (all eMinerals codes!), materials polymorph prediction, financial derivatives research, quantum mechanical codes, climatic research, medical image realisation… NUMBER 1 METRIC FOR SUCCESS: Users love it. simple to use, doesn’t break and they can forget about their jobs. NUMBER 2 METRIC FOR SUCCESS: UCL admin love it. 100% utilisation levels 24/7on the entire cluster network with no drop in performance and negligible costs satisfies our dyed-in-the-wool, naturally paranoid, sys admin. NUMBER 3 METRIC FOR SUCCESS: eMinerals developers love it: fast deployment, tweakable, can build on top of it, low admin, integratable with globus, great metadata, great free support, great workflow capabilities, Condor-G. NUMBER 4 METRIC FOR SUCCESS: eScience loves it. Other institutions are following our example, interest is high. Environment from the Molecular Level A NERC eScience testbed project
This is the largest single Condor pool in the UK (according to Condor) This is the first fully x-department institutional Condor pool in the UK. Several other Institutions have followed our lead: Cambridge, Cardiff. Much scope for combining resources (flocking, glide-in) Environment from the Molecular Level A NERC eScience testbed project WHAT IS MOST IMPORTANT? Condor ENABLES any scientist to do their work in a way they previously dreamed about: Beginning to make real the ability to match unbounded science with unbounded resources. Condor has slashed time-to-results from years to weeks- Scientists using our Condor resource have Redefined their ability to achieve their goals. Condor has organised resources at many levels: Desktop- June 2002 (2 nodes) Cluster- Sept 2002 (18 nodes) Department – Jan 2003 (150 nodes) Campus – October 16 th 2003 (930 nodes) WHERE NEXT- (?????? nodes, ???? Pools)… One million Condor nodes in a hollowed out volcano! Mwahahaha… …Regional and national Condor resources are next…
Here is an example- CamGrid: The Current Plan Environment 1: Single pool of ~400 linux boxes (plus ~500 Windows and Mac Os X to follow later). Owned and administered by the University Computing Services (UCS). Small number of submit nodes. X.509 certificate host authentication. No firewalls or private IP addresses. Environment 2: Desktop and teaching machines (some hundreds) of many colleges and departments, each with a pool. These will be flocked. Heterogeneous mix of architectures and operating systems. Many firewalls and private IP addresses. Hence, use a single VPN (secnet) Each machine has an another IP address on VPN. Each pool has a gateway on VPN (only one that needs a public IP). Gateway needs just one udp port allowed through a firewall. Traffic between pools/gateways is automatically encrypted. VPN model has already been tested between two flocked pools. Cardiff University are also following their own similar Condor programme…
This is the largest single Condor pool in the UK (according to Condor) This is the first fully x-department institutional Condor pool in the UK. Several other Institutions have followed our lead: Cambridge, Cardiff. Much scope for combining resources (flocking, glide-in) Environment from the Molecular Level A NERC eScience testbed project …Regional and national Condor resources continued. Many UK institutions have small/medium Condor pools. Many UK institutions have resources wasting millions of CPU cycles. We have proved the usefulness of large Windows Condor resources. Assurances regarding security, authorisation, authentication, access and reliable job execution are essential to the take up of Condor on this scale in the UK Many potential resources are Windows, which complicates matters (for example, poor GSI port to Windows and lack of Windows check-pointing.) With education, awareness, support and a core group to lead the way, UK institutions can form a national-level Condor infrastructure leveraging HTC resources for scientists within UK eScience. The UK eScience programme is heavily GRID-oriented. What can Condor provide to Grid-enable eMinerals and other UK eScience projects and resources?
Here’s the eMinerals answer: Workflow and Scripting: DAGman Grid connectivity: Condor-G These two extremely useful Condor tools provide the means to provide an integrated, usable, eMinerals ‘mini-grid’ for our Scientists, embracing several tools OUT OF THE BOX: –Globus 2.4: Gatekeeper to all compute resources –PBS: 3 x 16-node Beowolf Cluster MPI job queues –Condor: 2 x pools (UCL and Cambridge) –SRB: Storage Resource Broker, Virtual file system, 4 distributed eMinerals vaults. Environment from the Molecular Level A NERC eScience testbed project Here is what the eMinerals minigrid looks like:
lake.geol.ucl.ac.uk SRB lake.esc.cam.ac.uk 16 node queue: MPI SRB MCAT server (Daresbury) SRB PBS Job Manager Condor Job Manager Globus PBS Job Manager Condor Job Manager Globus Client machine runs a cmd-line perl- based DAGman/Condor-G submission script generator for PBS and Condor. Non-minigrid facilities (HPCx, JISC clusters, etc.) 930-node UCL Condor pool: vanilla, java 24-node CES Condor pool: std, vanilla 16 node queue: MPI e-Minerals minigrid THE eMINERALS MINIGRID SRB VAULT Daresbury resources University College resourcesCambridge resources Condor-G GT2.4 client User cert proxy CGI web interface: Condor q/status & real-time job output viewing CGI web interface: Condor q &status
Summary. Condor has enabled eMinerals scientists and their UK colleagues to perform their science: 1.in significantly new ways, 2.on previously un-tapped resources, 3.on previously unutilised operating systems, 4.in weeks rather than years, 5.in an integrated, heterogeneous, grid-enabled environment. 6.easily, painlessly and for no cost. 7.with equal importance given to data handling. 8.using out-of-the-box tools. Environment from the Molecular Level A NERC eScience testbed project
Conclusion: THIS MUST CONTINUE! Condor has an important part to play in the UK eScience programme: 1.Through meeting the increasing demands from users for large scale, accessible Condor-enabled HTC resources. 2.Through harnessing the significant volumes of existing, under- utilised, heterogeneous UK institutional hardware. 3.Through providing functionality to facilitate secure accessibility to heterogeneous compute and data resources. 4.Through engaging with the UK eScience programme within Condor’s grid/web service and standardisation developments. Environment from the Molecular Level A NERC eScience testbed project
Thanks for listening! Paul Wilson (eMinerals, UCL) Mark Calleja (eMinerals, Cambridge) Bruce Beckles (CamGrid) eMinerals project Environment from the Molecular Level A NERC eScience testbed project