Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grid and e-Science Technologies Simon Cox Technical Director Southampton Regional e-Science Centre.

Similar presentations


Presentation on theme: "Grid and e-Science Technologies Simon Cox Technical Director Southampton Regional e-Science Centre."— Presentation transcript:

1 Grid and e-Science Technologies Simon Cox Technical Director Southampton Regional e-Science Centre

2 Summary F The Grid problem: Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations F Grid architecture: Protocol, service definition for interoperability & resource sharing F Grid Middleware v Globus Toolkit a source of protocol and API definitions—and reference implementations v Open Grid Services Architecture represents next step in evolution v Condor High throughput Computing v Web Services & W3C leveraging e-business F e-Science P rojects applying Grid concepts to applications

3 Grid Computing

4 The Grid Problem “Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resource” - “The Anatomy of the Grid: Enabling Scalable Virtual Organizations” by Foster, Kesselman and Tuecke F Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals - assuming the absence of … v central location v central control v omniscience v existing trust

5 Why Grids? (1) e-Science F A biochemist exploits 10,000 computers to screen 100,000 compounds in an hour F 1,000 physicists worldwide pool resources for peta-op analyses of petabytes of data F Civil engineers collaborate to design, execute, & analyze shake table experiments F Climate scientists visualize, annotate, & analyze terabyte simulation datasets F An emergency response team couples real time data, weather model, population data

6 Grid Communities & Applications: Data Grids for High Energy Physics Tier2 Centre ~1 TIPS Online System Offline Processor Farm ~20 TIPS CERN Computer Centre FermiLab ~4 TIPS France Regional Centre Italy Regional Centre Germany Regional Centre Institute Institute ~0.25TIPS Physicist workstations ~100 MBytes/sec ~622 Mbits/sec ~1 MBytes/sec There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~622 Mbits/sec or Air Freight (deprecated) Tier2 Centre ~1 TIPS Caltech ~1 TIPS ~622 Mbits/sec Tier 0 Tier 1 Tier 2 Tier 4 1 TIPS is approximately 25,000 SpecInt95 equivalents www.griphyn.org www.ppdg.net www.eu-datagrid.org

7 Network for Earthquake Engineering Simulation F NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other F On-demand access to experiments, data streams, computing, archives, collaboration NEESgrid: Argonne, Michigan, NCSA, UIUC, USC

8 DOE X-ray grand challenge: ANL, USC/ISI, NIST, U.Chicago tomographic reconstruction real-time collection wide-area dissemination desktop & VR clients with shared controls Advanced Photon Source Online Access to Scientific Instruments archival storage

9 Why Grids? (2) e-Business F Engineers at a multinational company collaborate on the design of a new product F A multidisciplinary analysis in aerospace couples code and data in four companies F An insurance company mines data from partner hospitals for fraud detection F An application service provider offloads excess load to a compute cycle provider F An enterprise configures internal & external resources to support e-Business workload

10

11 Grids: Why Now?  Moore’s law  highly functional end-systems  Ubiquitous Internet  universal connectivity F Network exponentials produce dramatic changes in geometry and geography v 9-month doubling: double Moore’s law! v 1986-2001: x340,000; 2001-2010: x4000? F New modes of working and problem solving emphasize teamwork, computation F New business models and technologies facilitate outsourcing

12 Elements of the Problem F Resource sharing v Computers, storage, sensors, networks, … v Heterogeneity of device, mechanism, policy v Sharing conditional: negotiation, payment, … F Coordinated problem solving v Integration of distributed resources v Compound quality of service requirements F Dynamic, multi-institutional virtual organisations v Dynamic overlays on classic org structures v Map to underlying control mechanisms http://www.globus.org/research/papers/anatomy.pdf

13 The Grid World: Current Status F Dozens of major Grid projects in scientific & technical computing/research & education v Deployment, application, technology F Some consensus on key concepts and technologies v Open source Globus Toolkit™ a de facto standard for major protocols & services v Far from complete or perfect, but out there, evolving rapidly, and large tool/user base F Global Grid Forum a significant force F Industrial interest emerging rapidly http://www.gridforum.org

14 Grid Middleware (coordinate and authenticate use of grid services) F Globus (and GGF grid-computing protocols) v Security Infrastructure (GSI) v Resource Allocation Mechanism (GRAM) v Resource Information System (GRIS) v Index Information Service (GIIS) v Grid-FTP v Metadirectory service (MDS 2.0+) coupled to LDAP server F Condor (distributed high performance throughput system) v Condor-G allows us to handle dispatching jobs to our Globus system v Active collaboration from with the Condor development team at University of Wisconsin (Miron Livny)

15 The Globus Project Making Grid computing a reality F Close collaboration with real Grid projects in science and industry F Development and promotion of standard Grid protocols to enable interoperability and shared infrastructure F Development and promotion of standard Grid software APIs and SDKs to enable portability and code sharing F The Globus Toolkit: Open source, reference software base for building grid infrastructure and applications F Global Grid Forum: Development of standard protocols and APIs for Grid computing http://www.gridforum.org http://www.globus.org

16 Four Key Protocols F The Globus Toolkit centers around four key protocols v Connectivity layer: u Security: Grid Security Infrastructure (GSI) v Resource layer: u Resource Management: Grid Resource Allocation Management (GRAM) u Information Services: Grid Resource Information Protocol (GRIP) u Data Transfer: Grid File Transfer Protocol (GridFTP)

17 User process #1 Proxy Authenticate & create proxy credential GSI (Grid Security Infrastruc- ture) Gatekeeper (factory) Reliable remote invocation GRAM (Grid Resource Allocation & Management) Reporter (registry + discovery) User process #2 Proxy #2 Create process Register The Globus Toolkit in One Slide F Grid protocols (GSI, GRAM, …) enable resource sharing within virtual orgs; toolkit provides reference implementation ( = Globus Toolkit services) F Protocols (and APIs) enable other tools and services for membership, discovery, data mgmt, workflow, … Other service (e.g. GridFTP) Other GSI- authenticated remote service requests GIIS: Grid Information Index Server (discovery) MDS-2 (Meta Directory Service) Soft state registration; enquiry

18 Globus Toolkit: Evaluation (+) F Good technical solutions for key problems, e.g. v Authentication and authorization v Resource discovery and monitoring v Reliable remote service invocation v High-performance remote data access F This + good engineering is enabling progress v Good quality reference implementation, multi-language support, interfaces to many systems, large user base, industrial support v Growing community code base built on tools

19 Globus Toolkit: Evaluation (-) F Protocol deficiencies, e.g. v Heterogeneous basis: HTTP, LDAP, FTP v No standard means of invocation, notification, error propagation, authorization, termination, … F Significant missing functionality, e.g. v Databases, sensors, instruments, workflow, … v Virtualization of end systems (hosting envs.) F Little work on total system properties, e.g. v Dependability, end-to-end QoS, … v Reasoning about system properties

20

21 What is Condor? F Condor converts collections of distributively owned workstations and dedicated clusters into a distributed high-throughput computing facility. F Condor uses ClassAd Matchmaking to make sure that everyone is happy. F Features u Unix and NT u Operational since 1986 u Manages more than 1300 CPUs at UW-Madison u Software available free on the web u More than 150 Condor installations worldwide in academia and industry u Non-dedicated resources u Job checkpoint and migration

22 What is High-Throughput Computing? F High-performance: CPU cycles/second under ideal circumstances. v “How fast can I run simulation X on this machine?” F High-throughput: CPU cycles/day (week, month, year?) under non-ideal circumstances. v “How fast can I run simulation X on this machine?” v “How many times can I run simulation X in the next month using all available machines?”

23 Some HTC Challenges F Condor does whatever it takes to run your jobs, even if some machines… v Crash (or are disconnected) v Run out of disk space v Don’t have your software installed v Are frequently needed by others v Are far away & managed by someone else

24 What is ClassAd Matchmaking? F Condor uses ClassAd Matchmaking to make sure that work gets done within the constraints of both users and owners. F Users (jobs) have constraints: v “I need an Alpha with 256 MB RAM” F Owners (machines) have constraints: v “Only run jobs when I am away from my desk and never run jobs owned by Bob.”

25 Condor Pool Architecture

26 Mathematicians Solve NUG30 F Looking for the solution to the NUG30 quadratic assignment problem F An informal collaboration of mathematicians and computer scientists F Condor-G delivered 3.46E8 CPU seconds in 7 days (peak 1009 processors) in U.S. and Italy (8 sites) 14,5,28,24,1,3,16,15, 10,9,21,2,4,29,25,22, 13,26,17,30,6,20,19, 8,18,7,27,12,11,23 MetaNEOS: Argonne, Iowa, Northwestern, Wisconsin

27 What Is Condor-G? F Enhanced version of Condor that provides robust job management for Globus Toolkit v Robust replacement for globusrun v Provides extensive fault-tolerance v Brings Condor’s job management features to Globus jobs F Two Parts v Globus Universe v GlideIn F Excellent example of applying the general purpose Globus Toolkit to solve a particular problem (i.e. high- throughput computing) on the Grid

28 Why Use Condor-G F Condor v Designed to run jobs within a single administrative domain F Globus Toolkit v Designed to run jobs across many administrative domains F Condor-G v Combine the strengths of both

29 Web Services F Increasingly popular standards-based framework for accessing network applications v W3C standardization; Microsoft, IBM, Sun, others F XML and XML Schema v Representing data in a portable format F WSDL: Web Services Description Language v Interface Definition Language for Web services F SOAP: Simple Object Access Protocol v XML-based RPC protocol; common WSDL target F WSDL (/ WS-Inspection) v Conventions for locating service descriptions F UDDI: Universal Description, Discovery, & Integration v Directory for Web services

30

31 “New” Globus Open Grid Services Architecture (OGSA) F Service orientation to virtualize resources F From Web services: v Standard interface definition mechanisms: multiple protocol bindings, multiple implementations, local/remote transparency F Building on Globus Toolkit: v Grid service: semantics for service interactions v Management of transient instances (& state) v Factory, Registry, Discovery, other services v Reliable and secure transport F Multiple hosting targets: J2EE,.NET, “C”, … http://www.globus.org/research/papers/ogsa.pdf http://www.globus.org/research/papers/gsspec.pdf

32 OGSA Service Model F System comprises (a typically few) persistent services & (potentially many) transient services F All services adhere to specified Grid service interfaces and behaviours v Reliable invocation, lifetime management, discovery, authorization, notification, upgradeability, concurrency, manageability F Interfaces for managing Grid service instances v Factory, registry, discovery, lifetime, etc. => Reliable, secure management of distributed state

33 Using OGSA to Construct Grid Environments Factory Registry Service Factory H2R Mapper Service... (a) Simple Hosting Environment Factory Registry Service Factory H2R Mapper Service... F R F M SSS F R F M SSS (b) Virtual Hosting Environment E2E Factory E2E Reg E2E H2R Mapper... F 1 R M SSS F 2 R M SSS E2E S (c) Compound Services In each case, Registry handle is effectively the unique name for the virtual organization.

34 Evolution of Globus F Initial exploration (1996-1999; Globus 1.0) v Extensive application experiments; core protocols F Data Grids (1999-??; Globus 2.0+) v Large-scale data management and analysis F Open Grid Services Architecture (2001-??, Globus 3.0) v Integration with Web services, hosting environments, resource virtualization v Databases, higher-level services F Radically scalable systems (2003-??) v Sensors, wireless, ubiquitous computing

35 Summary F The Grid problem: Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations F Grid architecture: Protocol, service definition for interoperability & resource sharing F Grid Middleware v Globus Toolkit a source of protocol and API definitions—and reference implementations v Open Grid Services Architecture represents next step in evolution v Condor High throughput Computing v Web Services & W3C leveraging e-business F e-Science P rojects applying Grid concepts to applications


Download ppt "Grid and e-Science Technologies Simon Cox Technical Director Southampton Regional e-Science Centre."

Similar presentations


Ads by Google