Presentation on theme: "Stephen Jarvis High Performance Systems Group University of Warwick, UK Grid Computing: like herding cats?"— Presentation transcript:
Stephen Jarvis High Performance Systems Group University of Warwick, UK Grid Computing: like herding cats?
2 What are we going to cover today? – A brief history – Why we are doing it – Applications – Users – Challenges – Middleware What are you going to cover next week? – technical talk on the specifics of our work – Including application to e-Business and e-Science Sessions on Grid
3 An Overused Analogy Electrical Power Grid Computing power might somehow be like electrical power – plug in – switch on – have access to unlimited power We dont know who supplies the power, or where it comes from – just pick up the bill at the end of the month Is this the future of computing?
4 Is the computing infrastructure available? Computing power – 1986: Cray X-MP ($8M) – 2000: Nintendo-64 ($149) – 2003: Earth Simulator (NEC), ASCI Q (LANL) – 2005: Blue Gene/L (IBM), 360 Teraflops – Look at for current supercomputers! Sounds great - but how long?
5 Storage capabilities – 1986: Local data stores (MB) – 2002: Goddard Earth Observation System – 29TB Network capabilities – 1986 : NFSNET 56Kb/s backbone – 1990s: Upgraded to 45Mb/s (gave us the Internet) – 2000s: 40 Gb/s Storage & Network
6 Many Potential Resources GRID Terra-byte databases Space telescopes Millions of PCs 30% Utilisation Supercomputing Centres 10k PS/2 per week 50M Mobile Phones
The vision … mid 90s – to promote a revolution in how NASA addresses large- scale science and engineering – by providing a persistent HPC infrastructure Computing and data management services – on-demand – locate and co-schedule multi-Center resources – address large-scale and/or widely distributed problems Ancillary services – workflow management and coordination – security, charging … Some History: NASAs Information Power Grid
Lift Capabilities Drag Capabilities Responsiveness Thrust performance Reverse Thrust performance Responsiveness Fuel Consumption Braking performance Steering capabilities Traction Dampening capabilities Crew Capabilities - accuracy - perception - stamina - re-action times - SOPs Engine Models Airframe Models Landing Gear Models Stabilizer Models Human Models Whole system simulations are produced by coupling all of the sub-system simulations
SDSC LaRC GSFC MSFC KSC JSC NCSA Boeing JPL NGIX EDC NREN CMU GRC 300 node Condor pool NTON-II/SuperNet MCAT/ SRB O2000 DMF MDS CA O2000 cluster O2000 MDS
Virtual National Air Space VNAS GRC Engine Models LaRC Airframe Models Landing Gear Models ARC Wing Models Stabilizer Models Human Models FAA Ops Data Weather Data Airline Schedule Data Digital Flight Data Radar Tracks Terrain Data Surface Data 22,000 Commercial US Flights a day 50,000 Engine Runs 22,000 Airframe Impact Runs 132,000 Landing/ Take-off Gear Runs 48,000 Human Crew Runs 66,000 Stabilizer Runs 44,000 Wing Runs Simulation Drivers (Being pulled together under the NASA AvSP Aviation ExtraNet (AEN) National Air Space Simulation Environment
A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities. The capabilities need not be high end. The infrastructure needs to be relatively transparent. What is a Computational Grid?
Selected Grid Projects US Based – NASA Information Power Grid – DARPA CoABS Grid – DOE Science Grid – NSF National Virtual Observatory – NSF GriPhyN – DOE Particle Physics Data Grid – NSF DTF TeraGrid – DOE ASCI DISCOM Grid – DOE Earth Systems Grid etc… EU Based – DataGrid (CERN,..) – EuroGrid (Unicore) – Damien (Metacomputing) – DataTag (TransAtlanticTestbed, …) – Astrophysical Virtual Observatory – GRIP (Globus/Unicore) – GRIA (Industrial applications) – GridLab (Cactus Toolkit,..) – CrossGrid (Infrastructure Components) – EGSO (Solar Physics) Other National Projects – UK - e-Science Grid – Netherlands – VLAM-G, DutchGrid – Germany – UNICORE Grid, D-Grid – France – Etoile Grid – Italy – INFN Grid – Eire – Grid-Ireland – Scandinavia - NorduGrid – Poland – PIONIER Grid – Hungary – DemoGrid – Japan – JpGrid, ITBL – South Korea – N*Grid – Australia – Nimrod-G, …. – Thailand – Singapore – AsiaPacific Grid
The Big Spend: two examples – US Tera Grid $100 Million US Dollars (so far…) 5 supercomputer centres New ultra-fast optical network 40Gb/s Grid software and parallel middleware Coordinated virtual organisations Scientific applications and users – UK e-Science Grid £250 Million (so far…) Regional e-Science centres New infrastructure Middleware development Big science projects SuperJANET4
Cambridge Newcastle Edinburgh Oxford Glasgow Manchester Cardiff Soton London Belfast DL RL Hinxton Lancaster White Rose Birmingham /Warwick Bristol UCL e-Science Grid
15 NASA – Aerospace simulations, Air traffic control – NWS, In-aircraft computing – Virtual Airspace – Free fly, Accident prevention IBM – On-demand computing infrastructure – Protect software – Support business computing Governments – Simulation experiments – Biodiversity, genomics, military, space science… Who wants Grids and why?
16 Classes of Grid applications CategoryExamplesCharacteristics Distributed supercomputing DIS, Stellar dynamics, Chemistry Very large problems, lots of CPU, memory High Throughput Chip design, cryptography Harnessing idle resources On Demand Medical, Weather prediction Remote resources, time bounded Data IntensivePhysics, Sky surveys Synthesis of new information Collaborative Data exploration, virtual environments Connection between many parties
17 Classes of Grid CategoryExamplesCharacteristics Data GridEU DataGrid Lots of data sources from one site, processing off site Compute Grid Chip design, cryptography Harnessing and connecting rare resources Scavenging GridSETI CPU Cycle steeling, commodity resources Enterprise GridBanking Multi-site, but one organisation
Scientific Information Scientific Discovery In Real Time Real Time Integration Dynamic Application Integration Workflow Construction Interactive Visual Analysis Literature Databases Operational Data Images Instrument Data Using Distributed Resources Discovery Net Project
Nucleotide Annotation Workflows Download sequence from Reference Server Save to Distributed Annotation Server Execute distributed annotation workflow NCBIEMBL TIGRSNP Inter Pro SMART SWISS PROT GO KEGG 1800 clicks 500 Web access 200 copy/paste 3 weeks work in 1 workflow and few second execution
u An e-science challenge – non-trivial u NASA IPG as a possible paradigm u Need to integrate rigorously if to deliver accurate & hence biomedically useful results Noble (2002) Nature Rev. Mol. Cell.Biol. 3:460 Sansom et al. (2000) Trends Biochem. Sci. 25:368 molecular cellular organism Grand Challenge: Integrating Different Levels of Simulation
21 Classes of Grid users ClassPurposeMakes Use OfConcerns End UsersSolve problemsApplications Transparency, performance Application Developers Develop applications Programming models, tools Ease of use, performance Tool Developers Develop tools & prog. models Grid services Adaptivity, security Grid Developers Provide grid services Existing grid services Connectivity, security System Administrators Management of resources Management tools Balancing concerns
22 Composed of hierarchy of sub-systems Scalability is vital Key elements: – End systems Single compute nodes, storage systems, IO devices etc. – Clusters Homogeneous networks of workstations; parallel & distributed management – Intranet Heterogeneous collections of clusters; geographically distributed – Internet Interconnected intranets; no centralised control Grid architecture
23 State of the art – Privileged OS; complete control of resources and services – Integrated nature allows high performance – Plenty of high level languages and tool Future directions – Lack features for integration into larger systems – OS support for distributed computation – Mobile code (sandboxing) – Reduction in network overheads End Systems
24 State of the art – High-speed LAN, 100s or 1000s of nodes – Single administrative domain – Programming libraries like MPI – Inter-process communication, co-scheduling Future directions – Performance improvements – OS support Clusters
25 State of the art – Grids of many resources, but one admin. domain – Management of heterogeneous resources – Data sharing (e.g. databases, web services) – Supporting software environments inc. CORBA – Load sharing systems such as LSF and Condor – Resource discovery Future directions – Increasing complexity (physical scale etc) – Performance – Lack of global knowledge Intranets
26 State of the art – Geographical distribution, no central control – Data sharing is very successful – Management is difficult Future directions – Sharing other computing services (e.g. computation) – Identification of resources – Transparency – Internet services Internets
27 Authentication – Can the users use the system; what jobs can they run? Acquiring resources – What resources are available? – Resource allocation policy; scheduling Security – Is the data safe? Is the user process safe? Accounting – Is the service free, or should the user pay? Basic Grid services
28 Grids computing is a relatively new area – There are many challenges Nature of Applications – New methods of scientific and business computing Programming models and tools – Rethinking programming, algorithms, abstraction etc. – Use of software components/services System Architecture – Minimal demands should be placed on contributing sites – Scalability – Evolution of future systems and services Research Challenges (#1)
29 Problem solving methods – Latency- and fault-tolerant strategies – Highly concurrent and speculative execution Resource management – How are the resources shared? – How do we achieve end-to-end performance? – Need to specify QoS requirements – Then need to translate this to resource level – Contention? Research Challenges (#2)
30 Security – How do we safely share data, resources, tasks? – How is code transferred? – How does licensing work? Instrumentation and performance – How do we maintain good performance? – How can load-balancing be controlled? – How do we measure grid performance? Networking and infrastructure – Significant impact on networking – Need to combine high and low bandwidth Research Challenges (#3)
31 Many people see middleware as the vital ingredient Globus toolkit – Component services for security, resource location, resource management, information services OGSA – Open Grid Services Architecture – Drawing on web services technology GGF – International organisation driving Grid development – Contains partners such as Microsoft, IBM, NASA etc. Development of middleware
Requirements include: Offers up useful resources Accessible and useable resources Stable and adequately supported Single user Laptop feel Middleware has much of this responsibility
Demanding management issues Users are (currently) likely to be sophisticated but probably not computer techies Need to hide detail & obscene complexity Provide the vision of access of full resources Provide contract for level(s) of support (SLAs)
Key Interface between Applications & Machines Gate Keeper / Manager Acts as resource manager. Responsible for mapping applications to resources. Scheduling tasks. Ensuring service level agreements (SLAs) Distributed / Dynamic.
Middleware Projects Globus, Argonne National Labs, USA AppLeS, UC San Diego, USA Open Grid Services Architecture (OGSA) ICENI, Imperial, UK Nimrod, Melbourne, Australia Many others... including us!!
37 HPSGs approach: Determine what resources are required – (advertise) Determine what resources are available – (discovery) Map requirements to available resources – (scheduling) Maintain contract of performance – (service level of agreement) Performance drives the middleware decisions – PACE
38 [The Grid] intends to make access to computing power, scientific data repositories and experimental facilities as easy as the Web makes access to information. High Performance Systems Group, Warwick – Tony Blair, 2002
39 And herding cats … – 100,000s computers – Sat. links, miles of networking – Space telescopes, atomic colliders, medical scanners – Tera-bytes of data – Software stack a mile high…