Presentation on theme: "Grid Computing: like herding cats?"— Presentation transcript:
1Grid Computing: like herding cats? Stephen JarvisHigh Performance Systems GroupUniversity of Warwick, UK
2Sessions on Grid What are we going to cover today? A brief historyWhy we are doing itApplicationsUsersChallengesMiddlewareWhat are you going to cover next week?technical talk on the specifics of our workIncluding application to e-Business and e-Science
3An Overused Analogy Electrical Power Grid Computing power might somehow be like electrical powerplug inswitch onhave access to unlimited powerWe don’t know who supplies the power, or where it comes fromjust pick up the bill at the end of the monthIs this the future of computing?We think that world we are in today is the way it has always been - not the caseDevelopment of societal structure is bound up with infrastructure.So one question - (C) why is there Chicago? (C) Onions, lakes, mossies - not very promising.In the US at that time, farming was starting, clearing forests, monoculture farming, bison killed5. Great lakes & rivers were previous means of travel.6. Emergence of railroads linked the lakes and news lines.7. Chicago became a transit centre - a cache for goods.8. People started to exchange different goods - empowered by the infrastructure.9. New institutions were formed such as CBT financial org.10. “Grid” technologies such as fridge vans made export easier, cheaper, further...11. New “middleware” created unexpected industries (great retailing etc)12. If there was ever a city that resulted from the emergence of infrastructure - it is Chicago.
4Sounds great - but how long? Is the computing infrastructure available?Computing power1986: Cray X-MP ($8M)2000: Nintendo-64 ($149)2003: Earth Simulator (NEC), ASCI Q (LANL)2005: Blue Gene/L (IBM), 360 TeraflopsLook at for current supercomputers!1. It took Chicago over 100 years to do this2. Computing and comms is driven by exponential growth (its on steroids)3. For example, Cray (86) cost $8 million, had its own power substation, special cooling, no graphics.4. Its connection was a 56Kb/s NSF link.5. N64 in 2000 has the same processing power as the Cray X-MP and costs $150 bucks6. It uses 5 watts, not 60,000 wats7. It has 3 dimensional graphics8. Those with broadband or similar have more available than the NSF did only years ago.9. Think of todays compute power.10. Earth simulator (40Tflops peak, 35 sustained)11. ASCI LANL (20Tflops peak, 13 sustained)
5Storage & Network Storage capabilities Network capabilities 1986: Local data stores (MB)2002: Goddard Earth Observation System – 29TBNetwork capabilities1986 : NFSNET 56Kb/s backbone1990s: Upgraded to 45Mb/s (gave us the Internet)2000s: 40 Gb/s1. Massive increase in storage capabilities.2. In 86, local data stores were orders of kilobyte/megabyte3. NASA funded Goddard Earth system stores huge data sets (terabytes order)4. Networking & communication has grown massively.5. NSFNET upgraded to 45Mb in the early 90’s. Lead to the Internet today.6. Modern high speed networks are gigabits.
6Many Potential Resources Terra-bytedatabasesSpacetelescopes50M MobilePhonesMillions of PCs30% UtilisationGRID1. Lots of different resources type (heterogeneous resources)2. Mobile devices, large data sets, instrumentation, PCs, clusters, supercomputers, even playstations.3. Can these be linked in some sensible way?4. Well, it isn’t as simple as railroads and not all of this can be done.5. Sometimes the application can make it easy -> for example which highly partitionable.6. Most apps aren’t like that - plus they will have large data requirements.10k PS/2per weekSupercomputingCentres
7Some History: NASAs Information Power Grid The vision … mid ’90sto promote a revolution in how NASA addresses large-scale science and engineeringby providing a persistent HPC infrastructureComputing and data management serviceson-demandlocate and co-schedule multi-Center resourcesaddress large-scale and/or widely distributed problemsAncillary servicesworkflow management and coordinationsecurity, charging …
8Whole system simulations are produced by coupling all of the sub-system simulations Lift CapabilitiesDrag CapabilitiesResponsivenessStabilizer ModelsAirframe ModelsCrew Capabilities- accuracy- perception- stamina- re-action times- SOP’sHuman ModelsEngine ModelsBraking performanceSteering capabilitiesTractionDampening capabilitiesThrust performanceReverse Thrust performanceResponsivenessFuel ConsumptionLanding Gear Models
9NREN GRC GSFC LaRC JPL Boeing EDC NCSA SDSC NTON-II/SuperNet NGIX CMU MCAT/SRBBoeingDMFMDSCAO2000cluster300 node Condor poolMDSEDCGRCO2000NGIXCMUNRENNCSAGSFCLaRCJPLO2000clusterSDSCNTON-II/SuperNetMSFCMDSO2000JSCKSC
10National Air Space Simulation Environment Stabilizer ModelsGRCEngine Models44,000 Wing Runs50,000 Engine RunsWing ModelsAirframe Models66,000 StabilizerRunsARCLaRC22,000 CommercialUS Flights a dayVirtualNational Air SpaceVNAS22,000 AirframeImpact RunsFAA Ops DataWeather DataAirline Schedule DataDigital Flight DataRadar TracksTerrain DataSurface DataHuman ModelsSimulationDrivers48,000 HumanCrew Runs132,000 Landing/Take-offGear Runs(Being pulled togetherunder the NASA AvSPAviation ExtraNet (AEN)LandingGearModels
11What is a Computational Grid? A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities.The capabilities need not be high end.The infrastructure needs to be relatively transparent.1. Computational grids are for computing (naturally)2. Capability computing (productivity versus high performance)
13The Big Spend: two examples US Tera Grid$100 Million US Dollars (so far…)5 supercomputer centresNew ultra-fast optical network ≤ 40Gb/sGrid software and parallel middlewareCoordinated virtual organisationsScientific applications and usersUK e-Science Grid£250 Million (so far…)Regional e-Science centresNew infrastructureMiddleware developmentBig science projectsSuperJANET4
14e-Science Grid Edinburgh Glasgow DL Newcastle Lancaster White Rose BelfastManchesterBirmingham/WarwickCambridgeOxfordUCLBristolRLHinxtonCardiffLondonSoton
15Who wants Grids and why? NASA IBM Governments Aerospace simulations, Air traffic controlNWS, In-aircraft computingVirtual AirspaceFree fly, Accident preventionIBMOn-demand computing infrastructureProtect softwareSupport business computingGovernmentsSimulation experimentsBiodiversity, genomics, military, space science…
16Classes of Grid applications CategoryExamplesCharacteristicsDistributed supercomputingDIS, Stellar dynamics, ChemistryVery large problems, lots of CPU, memoryHigh ThroughputChip design, cryptographyHarnessing idle resourcesOn DemandMedical, Weather predictionRemote resources, time boundedData IntensivePhysics, Sky surveysSynthesis of new informationCollaborativeData exploration, virtual environmentsConnection between many partiesDistributed Interactive SimulationHT: loosly-coupled or independent tasks. Get CPUS to work (SETI)Synthesing, creating new informationP/B per year
17Classes of Grid Category Examples Characteristics Data Grid EU DataGridLots of data sources from one site, processing off siteCompute GridChip design, cryptographyHarnessing and connecting rare resourcesScavenging GridSETICPU Cycle steeling, commodity resourcesEnterprise GridBankingMulti-site, but one organisationDistributed Interactive SimulationHT: loosly-coupled or independent tasks. Get CPUS to work (SETI)Synthesing, creating new informationP/B per year
18Using Distributed Resources ScientificInformationScientific DiscoveryIn Real TimeLiteratureDatabasesOperationalDataImagesInstrumentReal Time IntegrationDynamic ApplicationIntegrationWorkflow ConstructionInteractive VisualAnalysisDiscovery Net ProjectUsing Distributed Resources
19Execute distributed annotation workflow Nucleotide Annotation WorkflowsNCBIEMBLTIGRSNPInterProSMARTSWISSPROTGOKEGGExecute distributed annotation workflowDownload sequence from Reference ServerSave to Distributed Annotation Server1800 clicks500 Web access200 copy/paste3 weeks workin 1 workflow and few second execution
20Grand Challenge: Integrating Different Levels of Simulation molecularGrand Challenge: Integrating Different Levels of SimulationcellularorganismSansom et al. (2000) Trends Biochem. Sci. 25:368An e-science challenge – non-trivialNASA IPG as a possible paradigmNeed to integrate rigorously if to deliver accurate & hence biomedically useful resultsNoble (2002) Nature Rev. Mol. Cell.Biol. 3:460
21Classes of Grid users Class Purpose Makes Use Of Concerns End Users Solve problemsApplicationsTransparency, performanceApplication DevelopersDevelop applicationsProgramming models, toolsEase of use, performanceTool DevelopersDevelop tools & prog. modelsGrid servicesAdaptivity, securityGrid DevelopersProvide grid servicesExisting grid servicesConnectivity, securitySystem AdministratorsManagement of resourcesManagement toolsBalancing concerns
22Grid architecture Composed of hierarchy of sub-systems Scalability is vitalKey elements:End systemsSingle compute nodes, storage systems, IO devices etc.ClustersHomogeneous networks of workstations; parallel & distributed managementIntranetHeterogeneous collections of clusters; geographically distributedInternetInterconnected intranets; no centralised control1. Grid has a hierarchical structure similar to the Internet principle of autonomous systems.2. Sub-systems are responsible for themselves, but fit together using standards.3. Scalability is vital.4.
23End Systems State of the art Future directions Privileged OS; complete control of resources and servicesIntegrated nature allows high performancePlenty of high level languages and toolFuture directionsLack features for integration into larger systemsOS support for distributed computationMobile code (sandboxing)Reduction in network overheads
24Clusters State of the art Future directions High-speed LAN, 100s or 1000s of nodesSingle administrative domainProgramming libraries like MPIInter-process communication, co-schedulingFuture directionsPerformance improvementsOS support
25Intranets State of the art Future directions Grids of many resources, but one admin. domainManagement of heterogeneous resourcesData sharing (e.g. databases, web services)Supporting software environments inc. CORBALoad sharing systems such as LSF and CondorResource discoveryFuture directionsIncreasing complexity (physical scale etc)PerformanceLack of global knowledge
26Internets State of the art Future directions Geographical distribution, no central controlData sharing is very successfulManagement is difficultFuture directionsSharing other computing services (e.g. computation)Identification of resourcesTransparencyInternet services
27Basic Grid services Authentication Acquiring resources Security Can the users use the system; what jobs can they run?Acquiring resourcesWhat resources are available?Resource allocation policy; schedulingSecurityIs the data safe? Is the user process safe?AccountingIs the service free, or should the user pay?
28Research Challenges (#1) Grids computing is a relatively new areaThere are many challengesNature of ApplicationsNew methods of scientific and business computingProgramming models and toolsRethinking programming, algorithms, abstraction etc.Use of software components/servicesSystem ArchitectureMinimal demands should be placed on contributing sitesScalabilityEvolution of future systems and services
29Research Challenges (#2) Problem solving methodsLatency- and fault-tolerant strategiesHighly concurrent and speculative executionResource managementHow are the resources shared?How do we achieve end-to-end performance?Need to specify QoS requirementsThen need to translate this to resource levelContention?
30Research Challenges (#3) SecurityHow do we safely share data, resources, tasks?How is code transferred?How does licensing work?Instrumentation and performanceHow do we maintain good performance?How can load-balancing be controlled?How do we measure grid performance?Networking and infrastructureSignificant impact on networkingNeed to combine high and low bandwidth
31Development of middleware Many people see middleware as the vital ingredientGlobus toolkitComponent services for security, resource location, resource management, information servicesOGSAOpen Grid Services ArchitectureDrawing on web services technologyGGFInternational organisation driving Grid developmentContains partners such as Microsoft, IBM, NASA etc.
32Middleware Conceptual Layers Workload Generation, Visualization…Discovery, Mapping, Scheduling, Security, Accounting…1. What is middleware2. Is the bit in between the resources and users.3. The Glue if you like.Computing, Storage, Instrumentation…
33Requirements include: Offers up useful resourcesAccessible and useable resourcesStable and adequately supportedSingle user ‘Laptop feel’Middleware has much of this responsibility
34Demanding management issues Users are (currently) likely to be sophisticatedbut probably not computer ‘techies’Need to hide detail & ‘obscene’ complexityProvide the vision of access of full resourcesProvide contract for level(s) of support (SLAs)
35Key Interface between Applications & Machines Gate Keeper / ManagerActs as resource manager.Responsible for mapping applications to resources.Scheduling tasks.Ensuring service level agreements (SLAs)Distributed / Dynamic.Key Interface between Applications & Machines
36Middleware Projects Globus, Argonne National Labs, USA AppLeS, UC San Diego, USAOpen Grid Services Architecture (OGSA)ICENI, Imperial, UKNimrod, Melbourne, AustraliaMany others... including us!!1. Middleware is a complex problem with a lot of attention.
37HPSG’s approach: Determine what resources are required (advertise)Determine what resources are available(discovery)Map requirements to available resources(scheduling)Maintain contract of performance(service level of agreement)Performance drives the middleware decisionsPACE
38High Performance Systems Group, Warwick ‘[The Grid] intends to make access to computing power, scientific data repositories and experimental facilities as easy as the Web makes access to information.’High Performance Systems Group, WarwickTony Blair, 2002
39And herding cats … 100,000s computers Sat. links, miles of networking Space telescopes, atomic colliders, medical scannersTera-bytes of dataSoftware stack a mile high…