N. GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

Slides:



Advertisements
Similar presentations
Operating System.
Advertisements

Ravi Sankar Technology Evangelist | Microsoft
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 9 Distributed Systems Architectures Slide 1 1 Chapter 9 Distributed Systems Architectures.
2. Computer Clusters for Scalable Parallel Computing
Chapter 7 LAN Operating Systems LAN Software Software Compatibility Network Operating System (NOP) Architecture NOP Functions NOP Trends.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Distributed Processing, Client/Server, and Clusters
Chapter 16 Client/Server Computing Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.
Overview Distributed vs. decentralized Why distributed databases
1 Providing a Single System Image: The GENESIS Approach Andrzej M. Goscinski School of Information Technology Deakin University.
Slide 3-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 3 Operating System Organization.
Hardware/Software Concepts Tran, Van Hoai Department of Systems & Networking Faculty of Computer Science & Engineering HCMC University of Technology.
DISTRIBUTED COMPUTING
The Origin of the VM/370 Time-sharing system Presented by Niranjan Soundararajan.
N. GSU Slide 1 Chapter 04 Cloud Computing Systems N. Xiong Georgia State University.
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
Chapter 2 Computer Clusters Lecture 2.1 Overview.
Infrastructure and Tools
Shilpa Seth.  Centralized System Centralized System  Client Server System Client Server System  Parallel System Parallel System.
1 Copyright © 2012, Elsevier Inc. All rights reserved Distributed and Cloud Computing K. Hwang, G. Fox and J. Dongarra Chapter 2: Computer Clusters.
Computer System Architectures Computer System Software
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
MobSched: An Optimizable Scheduler for Mobile Cloud Computing S. SindiaS. GaoB. Black A.LimV. D. AgrawalP. Agrawal Auburn University, Auburn, AL 45 th.
Distributed Systems 1 CS- 492 Distributed system & Parallel Processing Sunday: 2/4/1435 (8 – 11 ) Lecture (1) Introduction to distributed system and models.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems.
N. GSU Slide 1 Chapter 02 Cloud Computing Systems N. Xiong Georgia State University.
DISTRIBUTED COMPUTING
IMPROUVEMENT OF COMPUTER NETWORKS SECURITY BY USING FAULT TOLERANT CLUSTERS Prof. S ERB AUREL Ph. D. Prof. PATRICIU VICTOR-VALERIU Ph. D. Military Technical.
Chapter 2 Computer Clusters Lecture 2.2 Computer Cluster Architectures.
Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
1 CS 6823 ASU Chapter 2 Architecture.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.
Presented By: Samreen Tahir Coda is a network file system and a descendent of the Andrew File System 2. It was designed to be: Highly Highly secure Available.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
CPSC 171 Introduction to Computer Science System Software and Virtual Machines.
Introduction Why are virtual machines interesting?
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
Hands-On Virtual Computing
1 Distributed Processing Chapter 1 : Introduction.
©Ian Sommerville 2000, Tom Dietterich 2001 Slide 1 Distributed Systems Architectures l Architectural design for software that executes on more than one.
FTOP: A library for fault tolerance in a cluster R. Badrinath Rakesh Gupta Nisheeth Shrivastava.
Background Computer System Architectures Computer System Software.
SYSTEM MODELS FOR ADVANCED COMPUTING Jhashuva. U 1 Asst. Prof CSE
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Distributed Systems Architecure. Architectures Architectural Styles Software Architectures Architectures versus Middleware Self-management in distributed.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Ch. 2-2 Computer Clusters1 2.3 컴퓨터 클러스터의 설계 원칙 Single-System Image Featues  It means the illusion of a single system, single control, symmetry,
Distributed Operating Systems Spring 2004
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Distributed Operating Systems
Chapter 3: Windows7 Part 1.
QNX Technology Overview
Single System Image and Cluster Middleware
Introduction To Distributed Systems
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Design.
Lecture Topics: 11/1 Hand back midterms
Presentation transcript:

N. GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University

N. GSU Slide 2 Chapter 05 Review and Introduction

N. GSU Slide 3 Chapter 05 Design Objectives of Clusters and MPPs Cluster and MPP System Architectures Design Principles of Clustered Systems Multiple Job Scheduling and Management Virtual Clustering and Resource Provisioning Homework Problems Chapter 04 Main Contents

N. GSU Slide 4 Chapter 05 Scalability Packaging Control Homogeneity Security Design Objectives of Clustered Systems

N. GSU Slide 5 Chapter 05 Design Objectives of Clustered Systems

N. GSU Slide 6 Chapter 05 Fundamental Cluster Design Issues Scalable Performance Single System Image Availability Support Cluster Job Management Internode Communication Fault Tolerance and Recovery Growth of Servers in HPC and HTC Systems

N. GSU Slide 7 Chapter 05 Resource-Sharing in Cluster Systems

N. GSU Slide 8 Chapter 05 An Idealized Cluster Architecture Conventional databases and OLTP monitors offer users a desktop environment Supports parallel programming based on standard languages and communication libraries A user-interface subsystem combines the advantages of the Web interface and the windows GUI

N. GSU Slide 9 Chapter 05 Node Architectures and System Packaging Two types of cluster nodes compute nodes service nodes

N. GSU Slide 10 Chapter 05 Compute Node Examples

N. GSU Slide 11 Chapter 05 Modular Packaging of IBM BlueGene/L System

N. GSU Slide 12 Chapter 05 Cluster System Interconnects

N. GSU Slide 13 Chapter 05 High-Bandwidth Interconnects

N. GSU Slide 14 Chapter 05 An InfiniBand Cluster Interconnection Network

N. GSU Slide 15 Chapter 05 High-bandwidth Interconnects in Top-500 Systems

N. GSU Slide 16 Chapter 05 Hardware, Software, and Middleware Support

N. GSU Slide 17 Chapter 05 Design Principles of Clusters Single-System-Image (SSI ) Features Single System Single Control Symmetry Location Transparent

N. GSU Slide 18 Chapter 05 Design Principles of Clusters Single-System-Image Layers Application Software Layer Hardware or Kernel Layer Middleware Layer

N. GSU Slide 19 Chapter 05 Design Principles of Clusters Single-System-Image Composition Single Entry Point Single File Hierarchy Single I/O, Networking, and Memory Space Other Desired SSI Features

N. GSU Slide 20 Chapter 05 Single Entry Point

N. GSU Slide 21 Chapter 05 Single File Hierarchy It is persistent. It is fault tolerant to some degree. Network File System (NFS) and Andrew File System (AFS).

N. GSU Slide 22 Chapter 05 Single File Hierarchy

N. GSU Slide 23 Chapter 05 Single I/O, Networking, and Memory Space Single Input/Output Single Networking Single Point of Control Single Memory Space

N. GSU Slide 24 Chapter 05 Single I/O, Networking, and Memory Space

N. GSU Slide 25 Chapter 05 An Example

N. GSU Slide 26 Chapter 05 Other Desired SSI Features Single Job Management System Single User Interface Single Process Space

N. GSU Slide 27 Chapter 05 Middleware Support for SSI Clustering

N. GSU Slide 28 Chapter 05 High Availability Through Redundancy Reliability Availability Serviceability

N. GSU Slide 29 Chapter 05 Availability and Failure Rate

N. GSU Slide 30 Chapter 05 Availability Values of Several Representative Systems

N. GSU Slide 31 Chapter 05 Redundancy Techniques

N. GSU Slide 32 Chapter 05 Fault-Tolerant Cluster Configurations Hot Standby Mutual Takeover Fault-Tolerance

N. GSU Slide 33 Chapter 05 Recovery Schemes Backward recovery Forward recovery: in real- time systems

N. GSU Slide 34 Chapter 05 Checkpointing and Recovery Techniques Kernel, Library, and Application Levels Checkpoint Overheads Choosing an Optimal Checkpoint Interval

N. GSU Slide 35 Chapter 05 Checkpointing Parallel Programs

N. GSU Slide 36 Chapter 05 Cluster Job Scheduling and Management Cluster Job Management Issues A user server A job scheduler A resource manager

N. GSU Slide 37 Chapter 05 Cluster Job Types Serial jobs Parallel jobs Interactive jobs Batch jobs Foreign jobs

N. GSU Slide 38 Chapter 05 Multi-Job Scheduling Schemes

N. GSU Slide 39 Chapter 05 Share Cluster Nodes Dedicated Mode Space Sharing Time Sharing

N. GSU Slide 40 Chapter 05 Migration Schemes Issues Node Availability Migration Overhead Recruitment Threshold : the amount of time a workstation stays unused before the cluster considers it an idle node

N. GSU Slide 41 Chapter 05 Virtual Clustering and Resource Provisioning

N. GSU Slide 42 Chapter 05 Five Virtual Cluster Research Projects

N. GSU Slide 43 Chapter 05 Live VM Migration and Cluster Management

N. GSU Slide 44 Chapter 05 Effect by Live Migration

N. GSU Slide 45 Chapter 05 Dynamic Virtual Resource Provisioning

N. GSU Slide 46 Chapter 05 Autonomic Adaptation of Virtual Environments

N. GSU Slide 47 Chapter 05 Some References and Further Reading

N. GSU Slide 48 Chapter 05 Homework Problems

N. GSU Slide 49 Chapter 05 Homework Problems