Presentation on theme: "An Overview Grid Computing and Applications"— Presentation transcript:
1 An Overview Grid Computing and Applications Subject Code:WW GridRajkumar BuyyaGrid Computing and Distributed Systems (GRIDS) Lab. The University of Melbourne Melbourne, Australia
2 Overview Computing platforms and how the Grid is different ? Towards global (Grid) computing.Grid resource management and scheduling.Application development challenges.Approaches to Grid computing.Grid applicationsGrid Projects in GRIDS MelbourneSummary and conclusions
3 Major Networking and Computing Technologies Introduction * HTC* P2P* PDAsCOMPUTING* Mainframes* Minicomputers* PCs* Workstations* Grids* PC Clusters* Crays* MPPs* XEROX PARC worm* WS ClustersTechnologies Introduced* IETF* W3CNETWORKING* TCP/IP* Ethernet* HTML* Mosaic* Web Services** Internet Era* WWW Era* XML* ARPANET19601970197519801985199019952000
4 Internet: Past, Present, Future 140120100The 'Network Effect’ kicks in, and the web goes critical'Number of hosts(millions)806040201965197019751980198519901995200020052010TCP/IPHTMLMosaicXMLPHASE1. Packet Switching Networks2. The Internet isBorn3. The World Wide Web4.withXML5. The Grid1969: 4 US Universities linked to form ARPANETTCP/IP becomes core protocolHTML hypertext system created1972: First program createdDomain Name System created IETF created (1986)CERN launch World Wide Web1976: Robert Metcalfe develops EthernetNCSA launch Mosaic interface
5 Internet and WWW Growth 10,000,0001,000,000Internet Hosts100,00010,0001,000WWW Servers100104119691970197519801985199019952000
6 Installed base and Growth rate for telephone lines, mobile phones, & Internet hosts - 1995 Installed, Growth Rates (%)Income Group/ Phone Mobile Internet Phone Mobile InternetRegion Lines Phones Hosts Lines Phones HostsLower IncomeLower- MiddleUpper - MiddleHighAfricaAmericasAsiaEuropeOceansWorldSource: ACM, Nov, 97 (phones, international telecommunication union, hosts, network Wizards
11 Cluster of Clusters - Hyperclusters SchedulerMasterDaemonExecutionSubmitGraphicalControlClientsCluster 2Cluster 3Cluster 1LAN/WAN
12 Grid: Towards Internet Computing for (Coordinated) Resource Sharing Grid enables:Resource SharingSelectionAggreation- Unification of geographically distributed resources
13 What is Grid ?A paradigm/infrastructure that enabling the sharing, selection, & aggregation of geographically distributed resources:Computers – PCs, workstations, clusters, supercomputers, laptops, notebooks, mobile devices, PDA, etc;Software – e.g., ASPs renting expensive special purpose applications on demand;Catalogued data and databases – e.g. transparent access to human genome database;Special devices/instruments – e.g., radio telescope – searching for life in galaxy.People/collaborators.[depending on their availability, capability, cost, and user QoS requirements]for solving large-scale problems/applications.Wide area
14 P2P/Grid Applications-Drivers Distributed HPC (Supercomputing):Computational science.High-Capacity/Throughput Computing:Large scale simulation/chip design & parameter studies.Content Sharing (free or paid)Sharing digital contents among peers (e.g., Napster)Remote software access/renting services:Application service provides (ASPs) & Web services.Data-intensive computing:Drug Design, Particle Physics, Stock Prediction...On-demand, realtime computing:Medical instrumentation & Mission Critical.Collaborative Computing:Collaborative design, Data exploration, education.Service Oriented Computing (SOC):Computing as Competitive Utility: New paradigm, new industries, and new business.
15 Building and Using Grids requires... Services that make our systems Grid Ready!Security mechanisms that permit resources to be accessed only by authorized users.(New) programming tools that make our applications Grid Ready!.Tools that can translate the requirements of an application into requirements for computers, networks, and storage.Tools that perform resource discovery, trading, composition, scheduling and distribution of jobs and collects results.
16 A Typical Grid Computing Environment Grid Information ServiceGrid Resource BrokerdatabaseApplicationR2R3R4R5RNGrid Resource BrokerR6R1Resource BrokerGrid Information Service
19 Sources of Complexity in Resource Management for World Wide Computing Size (large number of nodes, providers, consumers)Heterogeneity of resources (PCs, Workstatations, clusters, and supercomputers)Heterogeneity of fabric management systems (single system image OS, queuing systems, etc.)Heterogeneity of fabric management policesHeterogeneity of applications (scientific, engineering, and commerce)Heterogeneity of application requirements (CPU, I/O, memory, and/or network intensive)Heterogeneity in demand pattersGeographic distribution and different time zonesDiffering goals (producers and consumers have different objectives and strategies)Unsecure and Unreliable environment
20 Traditional approaches to resource management are NOT useful for Grid ? They use centralised policy that needcomplete state-information andcommon fabric management policy or decentralised consensus-based policy.Due to too many heterogenous parameters in the Grid it is impossible to define:system-wide performance matrix andcommon fabric management policy that is acceptable to all.So, we propose the usage of “economics” paradigm for managing resourcesproved successful in managing decentralization and heterogeneity that is present in human economies!We can easy leverage proven Economic principles and techniquesEasy to regulate demand and supplyUser-centric, scalable, adaptable, value-driven costing, etc.Offers incentive (money?) for being part of the grid!
21 Grid Resource Management systems need to ensure/provide: Site autonomy.Heterogeneous resources and substrate:Each resource can be different – SMPs, Clusters, Linux, UNIX, Windows, Intel, etc.Resource owners have their own policies or scheduling mechanisms (Codine/Condor).Extend policies, through resource brokers.Resource allocation/co-allocationOnline control - can apps (Graphics) tolerate non-availability of a resource and adapt themselves?
22 Grid RMS to support Authentication (once). Specify (code, resources, etc.).Discover resources.Negotiate authorization, acceptable use, Cost, etc.Acquire resources.Schedule Jobs.Initiate computation.Steer computation.Access remote data-sets.Collaborate with results.Account for usage.Discover resources.Negotiate authorisation,acceptable use, Cost, etc.Acquire resources.Schedule jobs.Initiate computation.Steer computation.Domain 1Domain 2Ack: Globus..
26 Many Grid Projects & Initiatives AustraliaNimrod-GGridSimVirtual LabGridbusDISCWorld..new coming upEuropeUNICOREMOLUK eSciencePoland MC BrokerEU Data GridEuroGridMetaMPIDutch DASXW, JaWSJapanNinfDataFarmKorea...N*GridUSAGlobusLegionOGSAJavelinAppLeSNASA IPGCondor-GJxtaNetSolveAccessGridand many more...Cycle Stealing & .com InitiativesDistributed.net….Entropia, UD, Parabon,….Public ForumsGlobal Grid ForumP2P Working GroupIEEE TFCCGrid & CCGrid conferences
27 Table 4: Major European Grid Computing Efforts InitiativeFocus and Technologies DevelopedUNICOREThe UNiform Interface to Computer Resources aims to deliver software that allows users to submit jobs to remote high performance computing resources –MOLMetacomputer OnLine is a toolbox for the coordinated use of WAN/LAN connected systems. MOL aims at utilizing multiple WAN-connected high performance systems for solving large-scale problems that are intractable on a single supercomputer –METODISMetacomputing Tools for Distributed Systems –GlobeGlobe is a research project aiming to study and implement a powerful unifying paradigm for the construction of large-scale wide area distributed systems: distributed shared objects –PozanPoznan Centre works on development of tools and methods for metacomputing -Date GridThis project aims to develop middleware and tools necessary for the data-intensive applications of high-energy physics – grid.web.cern.ch/gridMetaMPIMetaMPI supports the coupling of heterogeneous MPI systems, thus allowing parallel applications developed using MPI to be run on Grids without alteration –DASThis is a wide-area distributed cluster, used for research on parallel and distributed computing by five Dutch universities –JaWsJaWS is an economy-based computing model where both resource owners and programs using these resources place bids to a central marketplace that generates leases of use – roadrunner.ics.forth.grTable 4: Major European Grid Computing Efforts
28 Focus and Technologies Developed InitiativeFocus and Technologies DevelopedGlobusThis project is developing basic software infrastructure for computations that integrate geographically distributed computational and information resources –LegionLegion is an object-based metasystem. Legion supports transparent scheduling, data management, fault tolerance, site autonomy, and a wide range of security options –JavelinJavelin: Internet-based parallel computing using Java –AppLesThis is an application-specific approach to scheduling individual parallel applications on production heterogeneous systems –NASA IPGThe Information Power Grid is a testbed that provides access to a Grid – a widely distributed network of high performance computers, stored data, instruments, and collaboration environments –CondorThis project aims is to develop, deploy, and evaluate mechanisms and policies that support high throughput computing (HTC) on large collections of distributed computing resources –HarnessHarness builds on the concept of the virtual machine and explores dynamic capabilities beyond what PVM can supply. It focused on developing three key capabilities: Parallel plug-ins, Peer-to-peer distributed control, and multiple virtual machines –NetSolveNetSolve is a project that aims to bring together disparate computational resources connected by computer networks. It is a RPC based client/agent/server system that allows one to remotely access both hardware and software components –Grid PortSDSCs Grid Port Toolkit generalises the HotPage infrastructure to develop a reusable portal toolkit –gridport.npaci.edu/HotPageNPACI’s HotPage is a user portal that is designed to be a single point-of-access to computer resources – hotpage.npaci.edu/GatewayGateway offers a programming paradigm implemented over a virtual Web of accessible resources -
29 InitiativeFocus and Technologies DevelopedNinfNinf allows users to access computational resources including hardware, software and scientific data distributed across a wide area network with an easy-to-use interface – ninf.etl.go.jpBricksBricks is a performance evaluation system that allows analysis and comparison of various scheduling schemes on a typical high-performance global computing setting – matsu-www.is.titech.ac.jp/~takefusa/bricks
30 InitiativeFocus and Technologies DevelopedDISCWorldAn infrastructure for service-based metacomputing across LAN and WAN clusters. It allows remote users to login to this environment over the Web and request access to data, and also to invoke services or operations on the available data – dhpc.adelaide.edu.au/Projects/DISCWorld/Nimrod/G & GRACEA global scheduler (resource broker) for parametric computing over clusters or computational grids –
31 Many Testbeds ? & who pays ? $gridGUSTOEcoGridLegion TestbedNASA IPG
33 Types of Grid Applications Sequential – dusty deck codes.Data Parallel:Synchronous – tightly coupled;Loosely synchronous.Asynchronous:Irregular in time and space;Difficult to parallelise to exploit the massive parallelism.Embarrassingly Parallel.
34 Grid Applications-Drivers Distributed HPC (Supercomputing):Computational science.High-throughput computing:Large scale simulation/chip design & parameter studies.Content SharingSharing digital contents among peers (e.g., Napster)Remote software access/renting services:Application service provides (ASPs).Data-intensive computing:Data mining, particle physics (CERN), Drug Design.On-demand computing:Medical instrumentation & network-enabled solvers.Collaborative:Collaborative design, data exploration, education.
35 Distributed Supercomputing (SF-Express/MPICH-G, Caltech) NCSAOriginCaltechExemplarCEWESSPMauiSF-Express distributed interactive simulation.100K vehicles (2002 goal) using 13 computers, 1386 nodes, 9 sites.Globus mechanisms forResource allocation;Distributed startup;I/O and configuration;Security.P. Messina et al., Caltech
36 SF-Express Architecture LocalSimulationRouterInterestMgmt.IPMPICreate synthetic, representations of interactive environments.Scalability via interest management.Starting point:MPI and socket communication;Hand startup.
37 High Throughput Computing (parameter sweep applications) A study involving exploration of possible scenarios - i.e., execution of the same program for various design alternatives (data).It consists of large number of tasks (1000s).Generally, no inter-task communication (task farming).Large size data (MBytes+) files and I/O constraintsA large class of application areas:Parameter explorations and simulations (Monte Carlo);A large number of science, engineering, and commercial applications: Astrophysics, Drug Design, NeroScience, Network simulation, structural engineering, automobiles crash simulation, aerospace modeling, financial risk analysisCondor, Nimrod/G, Distributed.net.
38 Ad Hoc Mobile Network Simulation Ad Hoc Mobile Network Simulation: Network performance under different microware frequencies and different weather conditions – uses Nimrod.
39 Drug Design: Data Intensive Computing on Grid ProteinMoleculesChemical Databases(legacy, in .MOL2 format)It involves screening millions of chemical compounds (molecules) in the Chemical DataBase (CDB) to identify those having potential to serve as drug candidates.
40 DesignDrug@Home Architecture A Virtual Lab for “Molecular Modeling for Drug Design” on P2P Grid Data Replica CatalogueGrid MarketDirectoryGrid Info.Service“Give me list PDBs sourcesOf type aldrich_300?”“service cost?”“service providers?”GTSResourceBroker“Screen 2K molecules in 30min. for $10”“mol.5 please?”GTS(RB maps suitable Grid nodes and Protein DataBank)“get mol.10 from pdb1 & screen it.”PDB2GTS“mol.10 please?”GTSGTS(GTS - GridTrade Server)PDB1
41 [Collaboration with Osaka University, Japan] MEG(MagnetoEncephaloGraphy) Data Analysis on the Grid: Brain Activity Analysis64 sensors MEGAnalysis All pairs (64x64) of MEG data by shifting the temporal region of MEG data over time: 0 to 29750: 64x64x29750 jobs2Data Generation31Data Analysis5ResultsNimrod-G4[deadline, budget, optimization preference]Life-electronics laboratory,AISTWorld-Wide GridProvision of expertise inthe analysis of brain functionProvision of MEG analysis[Collaboration with Osaka University, Japan]
42 SETI@home: Search for Extraterrestrial Intelligence at Home
46 Parallelisation of Image Rendering Image splitting (by rows, columns, and checker)Each segment can be concurrently processed on different nodes and render image as segments are processed.
47 Scheduling (need load balancing) Each row rendering takes different times depending on image nature – e.g, rendering rows across the sky take less time compared to those that intersect the interesting parts of the image.Rending apps can be implemented using MPI, PVM, or p-study tools like Nimrod and schedule.
48 Data Intensive Computing e.g., CERN Data Grid initiative
49 CERN Large Hadron Collider - circular particle accelerator to be placed in 27 km long tunnel in 2005.
50 Conclude with a comparison with the Electrical Grid……….. Where we are ????
51 Fresco by N. Cianfanelli (1841) Alessandro Volta in Paris in 1801 inside French National Institute shows the battery while in the presence of Napoleon IFresco by N. Cianfanelli (1841)(Zoological Section "La Specula" of National History Museum of Florence University)
52 What ?!?! Oh, mon Dieu ! This is a mad man… ….and in the future, I imagine a worldwidePower (Electrical) Grid …...Oh, monDieu !What ?!?!This is a mad man…
54 Grid Computing: A New Wave ? What will be the dominant Grid approach in the next future ??
55 ”The Computational Grid” is analogous to Electricity (Power) Grid and the vision is to offer a (almost) dependable, consistent, pervasive, and inexpensive access to high-end resources irrespective their location of physical existence and the location of access.
56 Trends It is very difficult to predict the future and this is particular true in a field such asInformation Technology“I think there is a world market for about five computers.”Thomas J. Watson Sr., IBM Founder, 1943
57 The time is exciting but the way ahead may be hard and long….! TrendsGridThe time is exciting but the way ahead may be hard and long….!
58 The Grid Impact!“The global computational grid is expected to drive the economy of the 21st century similar to the electric power grid that drove the economy of the 20th century”
59 Future Grid ScenariosAccess to any resources, for anyone, anywhere, anytime, from any platform – portal (super) computing .Application access to resources from the wall socket!Many applications provide solutions in real-time.Choice of working: office vs home vs . . .Collaboratories for distributed teams.Monitoring and steering applications through wireless devices (PDAs etc.).
60 Final SummaryThere are currently a large number of projects and diverse range of emerging Grid developmental approaches being pursued.These range from metacomputing frameworks to application testbeds, and from collaborative environments to batch submission mechanisms.
61 ConclusionsThe HPC will be dominated by Peer-to-Peer Grid of clusters.Adaptive, scalable, and easy to use Systems and End-User applications will be prominent.Access electricity, internet, entertainment (music, movie,…), etc. from the wall socket!An Economics –based Service Oriented Grid Computing computing needed for eventual success of Grids!The impact of Grid on 21st century economy will be the same as electricity on 20th century economy.
62 Further Information Books: IEEE Task Force on Cluster Computing High Performance Cluster Computing, V1, V2, R.Buyya (Ed), Prentice Hall, 1999.The GRID, I. Foster and C. Kesselman (Eds), Morgan-Kaufmann, 1999.IEEE Task Force on Cluster ComputingGRID ForumsCCGRID 2001,GRID Meeting -
63 Further Information Cluster Computing Infoware: Grid Computing Infoware:IEEE DS Online - Grid Computing area:Millennium Compute Power Grid/Market Project