Presentation is loading. Please wait.

Presentation is loading. Please wait.

N. GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

Similar presentations


Presentation on theme: "N. GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University."— Presentation transcript:

1

2 N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University

3 N. Xiong@ GSU Slide 2 Chapter 05 Review and Introduction

4 N. Xiong@ GSU Slide 3 Chapter 05 Design Objectives of Clusters and MPPs Cluster and MPP System Architectures Design Principles of Clustered Systems Multiple Job Scheduling and Management Virtual Clustering and Resource Provisioning Homework Problems Chapter 04 Main Contents

5 N. Xiong@ GSU Slide 4 Chapter 05 Scalability Packaging Control Homogeneity Security Design Objectives of Clustered Systems

6 N. Xiong@ GSU Slide 5 Chapter 05 Design Objectives of Clustered Systems

7 N. Xiong@ GSU Slide 6 Chapter 05 Fundamental Cluster Design Issues Scalable Performance Single System Image Availability Support Cluster Job Management Internode Communication Fault Tolerance and Recovery Growth of Servers in HPC and HTC Systems

8 N. Xiong@ GSU Slide 7 Chapter 05 Resource-Sharing in Cluster Systems

9 N. Xiong@ GSU Slide 8 Chapter 05 An Idealized Cluster Architecture Conventional databases and OLTP monitors offer users a desktop environment Supports parallel programming based on standard languages and communication libraries A user-interface subsystem combines the advantages of the Web interface and the windows GUI

10 N. Xiong@ GSU Slide 9 Chapter 05 Node Architectures and System Packaging Two types of cluster nodes compute nodes service nodes

11 N. Xiong@ GSU Slide 10 Chapter 05 Compute Node Examples

12 N. Xiong@ GSU Slide 11 Chapter 05 Modular Packaging of IBM BlueGene/L System

13 N. Xiong@ GSU Slide 12 Chapter 05 Cluster System Interconnects

14 N. Xiong@ GSU Slide 13 Chapter 05 High-Bandwidth Interconnects

15 N. Xiong@ GSU Slide 14 Chapter 05 An InfiniBand Cluster Interconnection Network

16 N. Xiong@ GSU Slide 15 Chapter 05 High-bandwidth Interconnects in Top-500 Systems

17 N. Xiong@ GSU Slide 16 Chapter 05 Hardware, Software, and Middleware Support

18 N. Xiong@ GSU Slide 17 Chapter 05 Design Principles of Clusters Single-System-Image (SSI ) Features Single System Single Control Symmetry Location Transparent

19 N. Xiong@ GSU Slide 18 Chapter 05 Design Principles of Clusters Single-System-Image Layers Application Software Layer Hardware or Kernel Layer Middleware Layer

20 N. Xiong@ GSU Slide 19 Chapter 05 Design Principles of Clusters Single-System-Image Composition Single Entry Point Single File Hierarchy Single I/O, Networking, and Memory Space Other Desired SSI Features

21 N. Xiong@ GSU Slide 20 Chapter 05 Single Entry Point

22 N. Xiong@ GSU Slide 21 Chapter 05 Single File Hierarchy It is persistent. It is fault tolerant to some degree. Network File System (NFS) and Andrew File System (AFS).

23 N. Xiong@ GSU Slide 22 Chapter 05 Single File Hierarchy

24 N. Xiong@ GSU Slide 23 Chapter 05 Single I/O, Networking, and Memory Space Single Input/Output Single Networking Single Point of Control Single Memory Space

25 N. Xiong@ GSU Slide 24 Chapter 05 Single I/O, Networking, and Memory Space

26 N. Xiong@ GSU Slide 25 Chapter 05 An Example

27 N. Xiong@ GSU Slide 26 Chapter 05 Other Desired SSI Features Single Job Management System Single User Interface Single Process Space

28 N. Xiong@ GSU Slide 27 Chapter 05 Middleware Support for SSI Clustering

29 N. Xiong@ GSU Slide 28 Chapter 05 High Availability Through Redundancy Reliability Availability Serviceability

30 N. Xiong@ GSU Slide 29 Chapter 05 Availability and Failure Rate

31 N. Xiong@ GSU Slide 30 Chapter 05 Availability Values of Several Representative Systems

32 N. Xiong@ GSU Slide 31 Chapter 05 Redundancy Techniques

33 N. Xiong@ GSU Slide 32 Chapter 05 Fault-Tolerant Cluster Configurations Hot Standby Mutual Takeover Fault-Tolerance

34 N. Xiong@ GSU Slide 33 Chapter 05 Recovery Schemes Backward recovery Forward recovery: in real- time systems

35 N. Xiong@ GSU Slide 34 Chapter 05 Checkpointing and Recovery Techniques Kernel, Library, and Application Levels Checkpoint Overheads Choosing an Optimal Checkpoint Interval

36 N. Xiong@ GSU Slide 35 Chapter 05 Checkpointing Parallel Programs

37 N. Xiong@ GSU Slide 36 Chapter 05 Cluster Job Scheduling and Management Cluster Job Management Issues A user server A job scheduler A resource manager

38 N. Xiong@ GSU Slide 37 Chapter 05 Cluster Job Types Serial jobs Parallel jobs Interactive jobs Batch jobs Foreign jobs

39 N. Xiong@ GSU Slide 38 Chapter 05 Multi-Job Scheduling Schemes

40 N. Xiong@ GSU Slide 39 Chapter 05 Share Cluster Nodes Dedicated Mode Space Sharing Time Sharing

41 N. Xiong@ GSU Slide 40 Chapter 05 Migration Schemes Issues Node Availability Migration Overhead Recruitment Threshold : the amount of time a workstation stays unused before the cluster considers it an idle node

42 N. Xiong@ GSU Slide 41 Chapter 05 Virtual Clustering and Resource Provisioning

43 N. Xiong@ GSU Slide 42 Chapter 05 Five Virtual Cluster Research Projects

44 N. Xiong@ GSU Slide 43 Chapter 05 Live VM Migration and Cluster Management

45 N. Xiong@ GSU Slide 44 Chapter 05 Effect by Live Migration

46 N. Xiong@ GSU Slide 45 Chapter 05 Dynamic Virtual Resource Provisioning

47 N. Xiong@ GSU Slide 46 Chapter 05 Autonomic Adaptation of Virtual Environments

48 N. Xiong@ GSU Slide 47 Chapter 05 Some References and Further Reading

49 N. Xiong@ GSU Slide 48 Chapter 05 Homework Problems

50 N. Xiong@ GSU Slide 49 Chapter 05 Homework Problems


Download ppt "N. GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University."

Similar presentations


Ads by Google