Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004.

Similar presentations


Presentation on theme: "Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004."— Presentation transcript:

1 Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

2 Overview  General Introduction to Grid Computing Introduction: Why Grids? Applications for Grids Basic Grid Architecture Grid Platforms & Standarts  Issues in Grid Computing Hardware: Blade Computers System Management : Globus Toolkit Software: Scheduling

3 What is Grid Computing?  Computational and Networking Infrastructure that is designed to provide pervasive, uniform and reliable access to data, computational and human resources distributed over wide area environments

4 Grids Are By Definition Heterogeneous  It’s about legacy resources, infrastructure, applications, policies, and procedures  The grid and its administrators must integrate in stealth mode…with Firewalls Filesystems Queuing systems Grumpy systems administrators Tried and true applications

5 A Grid Example

6 Challenges in Grid Computing  Reliable performance  Trust relationships between multiple security domains  Deployment and maintenance of grid middleware across hundreds or thousands of nodes  Access to data across WAN’s  Access to state information of remote processes  Workflow / dependency management  Distributed software and license management  Accounting and billing

7 Applications for a Grid  Generally, apps that work well on clusters can work well on grids  Non-interactive / batch jobs  Parallel computations with minimal interprocess communication and workflow dependencies  Reasonable data transfer requirements  Sensible economics Productivity Gains > Cost of Building Grid + Opportunity Costs of Resources

8 Non-Interactive / Batch Jobs  Difficult to get a real-time UI for jobs running on the grid A possible interactive application: spreadsheet computation  Want to take advantage of off-peak free cycles Jobs run for several days, weeks or months The user might prefer to be sleeping while the job runs!  Running processes might need to be interrupted or re-prioritized based on the current load on a grid compute engine Idle thread / “screensaver” computing

9 Seti@Home

10 Parallel Computations  Application needs to be able to run as multiple, mostly independent pieces Can’t depend on the network’s Quality of Service Can’t rely upon the order of execution and completion Apps that need these things are better suited for tightly coupled compute platforms (e.g. SMP systems) Grid can still be useful as a meta-scheduler and data source for such apps  e.g. the user submits the job to the grid queue and asks for the best available SMP resource

11 Some Costs and Benefits Costs:  Grid Middleware  Architects and Developers  User Training  Infrastructure Hardware  Opportunity Costs Would a big SMP box return better results for your problem? Benefits:  Better Utilization of Existing Capital Resources  More Efficient Users  Ability to complete more work in the same amount of time Performance near or sometimes as good as the big SMP box

12 Basic Grid Architecture  Clusters and how grids are different than clusters  Departmental Grid Model  Enterprise Grid Model  Global Grid Model

13 What Makes a Cluster a Cluster?  Uses a Distributed Resource Manager (DRM) to manager job scheduling  Tightly coupled - High speed, low latency interconnect network  Fairly homogenous - Configuration management is important!  Single administrative domain

14 The Cluster Model RDPM3ADMMP Operating System StorageCompute Cluster DRM RDPM3ADMMP Operating System StorageCompute Cluster DRM RDPM3ADMMP Operating System StorageCompute Cluster DRM RDPM3ADMMP Operating System StorageCompute Cluster DRM RDPM3ADMMP User Interface/API Cluster DRM Cluster Node High Speed Interconnect Master Node Shared Storage Configuration Management

15 How is an Enterprise Grid Different from a Cluster?  Heterogeneous - Clusters, SMP, even workstations of dissimilar configurations, but all are tied together through a grid middleware layer  Lightly coupled - Connected via 100 or 1000Mbps Ethernet  Introduces a resource registry and grid security service But usually only a single registry and security service for the grid  Not necessarily a single administrative domain

16 The Enterprise Grid Model RDPMAADMMP Operating System StorageCompute Cluster Interface RDPMAADMMP Operating System StorageCompute Cluster Interface RDPMAADMMP Operating System StorageCompute Cluster Interface RDPM3ADMMP Operating System StorageCompute Grid Interface RDPM3ADMMP Operating System StorageCompute Grid Interface RDPM3ADMMP User Interface/API Grid Interface SMP Enterprise LAN or WAN Security Infrastructure Resource Registry Grid Interface Cluster DRM RDPMAADMMP Operating System StorageCompute Cluster Interface RDPMAADMMP Operating System StorageCompute Cluster Interface RDPMAADMMP Operating System StorageCompute Cluster Interface Grid Interface Cluster DRM RDPM3ADMMPRDPM3ADMMP

17 How is a Global Grid Different from an Enterprise Grid?  "Grid of Grids" - Collection of enterprise grids  Loosely coupled between sites - Not much control over Quality of Service  Mutually distrustful administrative domains  Multiple grid resource registries and grid security services

18 The Global Grid Model Grid WAN RRSI Cluster Grid SMP Grid SMP Grid Cluster UI/API Grid LAN Grid RRSI SMP Grid SMP Grid SMP Grid Cluster RRSI ClusterSMP Grid Cluster Grid LAN Site A Site B Site C UI/API Grid UI/API Grid LAN

19 Grid Platforms & Standards  The Global Grid Forum http://www.gridforum.org/  Globus Toolkit  DCML (Data Center Markup Language)

20 Globus Toolkit V2 “Pillars” Information Services (MDS) Data Management (GASS) Resource Management (GRAM) Grid Security Infrastructure (GSI)

21 Globus Toolkit V2 Stack MDSGASS/GridFTPGRAM GSI HTTPLDAPFTP TLS/SSL TCP/IP

22 Globus Toolkit V2 Key Components: GRAM, MDS and GASS  Grid Resource Allocation Manager (GRAM) Server-side: “gatekeeper” process that controls execution of job managers Client-side: “globusrun” UI to launch jobs  Monitoring and Directory Service (MDS) GRIS: Grid Resource Information Service collects local info GIIS: Grid Index Information Service collects GRIS info  Global Access to Secondary Storage (GASS) GridFTP, implemented through “in.ftpd” daemon and “globus-url-copy” command Files accessed through a URI, e.g. gsiftp://node1.ncbiogrid.org/data/ncbi/ecoli.nt

23 Globus Toolkit V2 Additional Components  Grid Packaging Tools (GPT) Used to build (“gpt-build”), install (“gpt- install”) and localize (“gpt-postinstall”) Globus components  MPICH-G2 A Globus V2 enabled version of MPI (Message Passing Interface) Based on MPICH Utilizes GSI, MDS and GRAM

24 Globus Toolkit V2 Network Services Certificate Authority GIIS Server GRIS gatekeeper in.ftpd Grid Node GRAM Client Client Node GRIS gatekeeper in.ftpd Grid Node GRIS gatekeeper in.ftpd Grid Node GRIS gatekeeper in.ftpd Grid Node Network

25 GRAM, MDS and GASS Interactions resource process job manager gatekeeper process GRAM GRIS resource GIIS MDS GridFTP in.ftpd GASS job allocation job management resource discovery data transfer data control user / proxy Client RSL/DUROC/HTTP 1.1LDAP gsiftp

26 Globus Toolkit V2 Strengths and Weaknesses Strengths:  Mindshare and collaboration in both industry & academia  Open source  Standards-based underpinnings (e.g. SSL, LDAP)  Flexibility and CoG API's  Driving OGSA with heavy resource commitment from IBM Weaknesses:  Significant effort required to get applications working on a grid  Not production quality at this time  No “metascheduler” -- user has to explicitly tell their jobs where to run

27 Issues in Grid Computing Hardware : Blades

28 Hardware Trends  HW Trends that enable Grids and Distributed Processing There is a lot of idle computing power Computers are now better connected There are many different brands and configurations in any environment  And Distributed Computing that give rise to new HW architectures Blade Computers

29 What is a blade?  Inclusive chassis-based modular computing system that includes processors, memory, network interface cards and local storage on a single board. Blade Blade Chasis & Blades Blade Farm

30 Anatomy of a blade

31 How far it can go?

32 Advantages & Disadvantages  Low Cost (power, heat, data center space)  Physical Server Consolidation (Save space, eliminate cables)  High Availability  Integrated Systems Management  Not suitable in small numbers  Need for standardization (for network connection and management)

33 Blades & Grid  Each blade is a server that can run jobs.  Blades can be used to form clusters or grids.  With efficient management different configurations of blades can be used in a single grid computer. Easy to expand Protects investment

34 Issues in Grid Computing System Management : Globus Toolkit

35 Globus Toolkit V2 “Pillars” Information Services (MDS) Data Management (GASS) Resource Management (GRAM) Grid Security Infrastructure (GSI)

36 Globus Toolkit V2 Stack MDSGASS/GridFTPGRAM GSI HTTPLDAPFTP TLS/SSL TCP/IP

37 Globus Toolkit V2 Key Components: GRAM, MDS and GASS  Grid Resource Allocation Manager (GRAM) Server-side: “gatekeeper” process that controls execution of job managers Client-side: “globusrun” UI to launch jobs  Monitoring and Directory Service (MDS) GRIS: Grid Resource Information Service collects local info GIIS: Grid Index Information Service collects GRIS info  Global Access to Secondary Storage (GASS) GridFTP, implemented through “in.ftpd” daemon and “globus-url-copy” command Files accessed through a URI, e.g. gsiftp://node1.ncbiogrid.org/data/ncbi/ecoli.nt

38 Globus Toolkit V2 Additional Components  Grid Packaging Tools (GPT) Used to build (“gpt-build”), install (“gpt- install”) and localize (“gpt-postinstall”) Globus components  MPICH-G2 A Globus V2 enabled version of MPI (Message Passing Interface) Based on MPICH Utilizes GSI, MDS and GRAM

39 Globus Toolkit V2 Network Services Certificate Authority GIIS Server GRIS gatekeeper in.ftpd Grid Node GRAM Client Client Node GRIS gatekeeper in.ftpd Grid Node GRIS gatekeeper in.ftpd Grid Node GRIS gatekeeper in.ftpd Grid Node Network

40 GRAM, MDS and GASS Interactions resource process job manager gatekeeper process GRAM GRIS resource GIIS MDS GridFTP in.ftpd GASS job allocation job management resource discovery data transfer data control user / proxy Client RSL/DUROC/HTTP 1.1LDAP gsiftp

41 Globus Toolkit V2 Strengths and Weaknesses Strengths:  Mindshare and collaboration in both industry & academia  Open source  Standards-based underpinnings (e.g. SSL, LDAP)  Flexibility and CoG API's  Driving OGSA with heavy resource commitment from IBM Weaknesses:  Significant effort required to get applications working on a grid  Not production quality at this time  No “metascheduler” -- user has to explicitly tell their jobs where to run

42 Issues in Grid Computing Software : Scheduling

43 Superscheduling  Superscheduling means scheduling resources in multiple administrative domains.  Various models Submiting a job to a specific single machine Submiting a job to single machines at multiple sites (With cancellation option) Scheduling a single job to use multiple resources  Most common superscheduler : USERS

44 Phases Of Superscheduling  Resource Discovery Authorisation Filtering Application Requirement Definition Minimal Requirement Filtering  System Selection Gathering Information (Query) Select Systems to run on  Run the Job Make an Advance Reservation (Optional) Submit Job to Resources Preperation Tasks Monitor Progress Job Completion Completion Tasks Source : Global Grid Forum, Scheduling Working Group, 10 Actions When Scheduling, Schopf, 2001

45 Scheduling Framework (Ranganathan & Foster 2003)  External Scheduler  Local Scheduler  Dataset Scheduler

46 Scheduling And Replication Algorithms  External Scheduler JobRandom JobLeastLoaded JobDataPresent JobLocal  Dataset Scheduler DataDoNothing: No Active Replitication. Everything is on demand DataRandom: Popular Datasets are replicated to Random Sites DataLeastLoaded: Popular Datasets are snet to the least loaded sites.

47 Simulation Results Average Response TimesAverage Data Transfered

48 Grid Computing Thank You and Questions?


Download ppt "Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004."

Similar presentations


Ads by Google