Architecture Alternatives for Scalable Single System Image Clusters

Architecture Alternatives for Scalable Single System Image Clusters
Rajkumar Buyya, Monash University, Melbourne, Australia.

Agenda Cluster ? Enabling Tech. & Motivations Cluster Architecture
What Single System Image (SSI) ? Levels of SSI ? Cluster Middleware Vs. Underware Representative SSI implementations Pros and Cons of SSI at difference levels Ann.--IEEE Task Force on Cluster Computing Resources and Conclusions

Need of more Computing Power: Grand Challenge Applications
Solving technology problems using computer modeling, simulation and analysis Geographic Information Systems Life Sciences Aerospace Mechanical Design & Analysis (CAD/CAM)

Competing Computer Architectures
Vector Computers (VC) ---proprietary system provided the breakthrough needed for the emergence of computational science, buy they were only a partial answer. Massively Parallel Processors (MPP)-proprietary system high cost and a low performance/price ratio. Symmetric Multiprocessors (SMP) suffers from scalability Distributed Systems difficult to use and hard to extract parallel performance. Clusters -- gaining popularity High Performance Computing---Commodity Supercomputing High Availability Computing ---Mission Critical Applications

Technology Trend... Performance of PC/Workstations components has almost reached performance of those used in supercomputers… Microprocessors (50% to 100% per year) Networks (Gigabit ..) Operating Systems Programming environment Applications Rate of performance improvements of commodity components is too high.

Technology Trend

The Need for Alternative Supercomputing Resources
Cannot afford to buy “Big Iron” machines due to their high cost and short life span. cut-down of funding don’t “fit” better into today's funding model. …. Paradox: time required to develop a parallel application for solving GCA is equal to: half Life of Parallel Supercomputers.

Clusters are best-alternative!
Supercomputing-class commodity components are available They “fit” very well with today’s/future funding model. Can leverage upon future technological advances VLSI, CPUs, Networks, Disk, Memory, Cache, OS, programming tools, applications,...

parallel computers/supercomputer-class workstation cluster
Best of both Worlds! High Performance Computing (talk focused on this) parallel computers/supercomputer-class workstation cluster dependable parallel computers High Availability Computing mission-critical systems fault-tolerant computing

What is a cluster? A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers cooperatively working together as a single, integrated computing resource. A typical cluster: Network: Faster, closer connection than a typical network (LAN) Low latency communication protocols Looser connection than SMP

So What’s So Different about Clusters?
Commodity Parts? Communications Packaging? Incremental Scalability? Independent Failure? Intelligent Network Interfaces? Complete System on every node virtual memory scheduler files … Nodes can be used individually or combined...

Clustering of Computers for Collective Computating
1990 1995+ 1960

Computer Food Chain (Now and Future)
Demise of Mainframes, Supercomputers, & MPPs 2

Cluster Configuration..1 Dedicated Cluster
3

Cluster Configuration..2 Enterprise Clusters (use JMS like Codine)
Shared Pool of Computing Resources: Processors, Memory, Disks Interconnect Guarantee at least one workstation to many individuals (when active) Deliver large % of collective resources to few individuals at any one time 3

Windows of Opportunities
MPP/DSM: Compute across multiple systems: parallel. Network RAM: Idle memory in other nodes. Page across other nodes idle memory Software RAID: file system supporting parallel I/O and reliability, mass-storage. Multi-path Communication: Communicate across multiple networks: Ethernet, ATM, Myrinet

Cluster Computer Architecture

Major issues in cluster design
Size Scalability (physical & application) Enhanced Availability (failure management) Single System Image (look-and-feel of one system) Fast Communication (networks & protocols) Load Balancing (CPU, Net, Memory, Disk) Security and Encryption (clusters of clusters) Distributed Environment (Social issues) Manageability (admin. And control) Programmability (simple API if required) Applicability (cluster-aware and non-aware app.)

Cluster Middleware and Single System Image

A typical Cluster Computing Environment
Application PVM / MPI/ RSH ??? Hardware/OS

CC should support Multi-user, time-sharing environments
Nodes with different CPU speeds and memory sizes (heterogeneous configuration) Many processes, with unpredictable requirements Unlike SMP: insufficient “bonds” between nodes Each computer operates independently Inefficient utilization of resources

The missing link is provide by cluster middleware/underware
PVM / MPI/ RSH Application Hardware/OS Middleware or Underware

SSI Clusters--SMP services on a CC
“Pool Together” the “Cluster-Wide” resources Adaptive resource usage for better performance Ease of use - almost like SMP Scalable configurations - by decentralized control Result: HPC/HAC at PC/Workstation prices

What is Cluster Middleware ?
An interface between between use applications and cluster hardware and OS platform. Middleware packages support each other at the management, programming, and implementation levels. Middleware Layers: SSI Layer Availability Layer: It enables the cluster services of Checkpointing, Automatic Failover, recovery from failure, fault-tolerant operating among all cluster nodes.

Middleware Design Goals
Complete Transparency (Manageability) Lets the see a single cluster system.. Single entry point, ftp, telnet, software loading... Scalable Performance Easy growth of cluster no change of API & automatic load distribution. Enhanced Availability Automatic Recovery from failures Employ checkpointing & fault tolerant technologies Handle consistency of data when replicated..

What is Single System Image (SSI) ?
A single system image is the illusion, created by software or hardware, that presents a collection of resources as one, more powerful resource. SSI makes the cluster appear like a single machine to the user, to applications, and to the network. A cluster without a SSI is not a cluster

Benefits of Single System Image
Usage of system resources transparently Transparent process migration and load balancing across nodes. Improved reliability and higher availability Improved system response time and performance Simplified system management Reduction in the risk of operator errors User need not be aware of the underlying system architecture to use these machines effectively

Desired SSI Services telnet cluster.my_institute.edu
Single Entry Point telnet cluster.my_institute.edu telnet node1.cluster. institute.edu Single File Hierarchy: xFS, AFS, Solaris MC Proxy Single Control Point: Management from single GUI Single virtual networking Single memory space - Network RAM / DSM Single Job Management: Glunix, Codine, LSF Single User Interface: Like workstation/PC windowing environment (CDE in Solaris/NT), may it can use Web technology

Availability Support Functions
Single I/O Space (SIO): any node can access any peripheral or disk devices without the knowledge of physical location. Single Process Space (SPS) Any process on any node create process with cluster wide process wide and they communicate through signal, pipes, etc, as if they are one a single node. Checkpointing and Process Migration. Saves the process state and intermediate results in memory to disk to support rollback recovery when node fails. PM for Load balancing... Reduction in the risk of operator errors User need not be aware of the underlying system architecture to use these machines effectively

Scalability Vs. Single System Image
UP

SSI Levels/How do we implement SSI ?
It is a computer science notion of levels of abstractions (house is at a higher level of abstraction than walls, ceilings, and floors). Application and Subsystem Level Operating System Kernel Level Hardware Level

SSI Characteristics 1. Every SSI has a boundary
2. Single system support can exist at different levels within a system, one able to be build on another

SSI Boundaries -- an applications SSI boundary
Batch System SSI Boundary Source: In search of clusters

SSI via OS path! 1. Build as a layer on top of the existing OS Benefits: makes the system quickly portable, tracks vendor software upgrades, and reduces development time. i.e. new systems can be built quickly by mapping new services onto the functionality provided by the layer beneath. Eg: Glunix 2. Build SSI at kernel level, True Cluster OS Good, but Can’t leverage of OS improvements by vendor E.g. Unixware, Solaris-MC, and MOSIX

SSI Representative Systems
OS level SSI SCO NSC UnixWare Solaris-MC MOSIX, …. Middleware level SSI PVM, TreadMark (DSM), Glunix, Condor, Codine, Nimrod, …. Application level SSI PARMON, Parallel Oracle, ...

SCO NonStop® Cluster for UnixWare
UP or SMP node UP or SMP node Users, applications, and systems management Users, applications, and systems management Standard OS kernel calls Standard OS kernel calls Extensions Extensions Modular kernel extensions Modular kernel extensions Standard SCO UnixWare® with clustering hooks Standard SCO UnixWare with clustering hooks Devices Devices ServerNet™ Other nodes 8

How does NonStop Clusters Work?
Modular Extensions and Hooks to Provide: Single Clusterwide Filesystem view Transparent Clusterwide device access Transparent swap space sharing Transparent Clusterwide IPC High Performance Internode Communications Transparent Clusterwide Processes, migration,etc. Node down cleanup and resource failover Transparent Clusterwide parallel TCP/IP networking Application Availability Clusterwide Membership and Cluster timesync Cluster System Administration Load Leveling 9

Solaris-MC: Solaris for MultiComputers
global file system globalized process management globalized networking and I/O

Solaris MC components Object and communication support
High availability support PXFS global distributed file system Process mangement Networking

Multicomputer OS for UNIX (MOSIX)
An OS module (layer) that provides the applications with the illusion of working on a single system Remote operations are performed like local operations Transparent to the application - user interface unchanged Application PVM / MPI / RSH MOSIX Hardware/OS

Main tool Preemptive process migration that can migrate--->any process, anywhere, anytime Supervised by distributed algorithms that respond on-line to global resource availability - transparently Load-balancing - migrate process from over-loaded to under-loaded nodes Memory ushering - migrate processes from a node that has exhausted its memory, to prevent paging/swapping

MOSIX for Linux at HUJI 50 Pentium-II 300 MHz
A scalable cluster configuration: 50 Pentium-II 300 MHz 38 Pentium-Pro 200 MHz (some are SMPs) 16 Pentium-II 400 MHz (some are SMPs) Over 12 GB cluster-wide RAM Connected by the Myrinet 2.56 G.b/s LAN Runs Red-Hat 6.0, based on Kernel 2.2.7 Upgrade: HW with Intel, SW with Linux Download MOSIX:

Nimrod - A Job Management System

Job processing with Nimrod

Nimrod Architecture

PARMON: A Cluster Monitoring Tool
PARMON Server on each node PARMON Client on JVM PARMON High-Speed Switch parmond parmon 4

Resource Utilization at a Glance

Relationship Among Middleware Modules

Single I/O Space and Design Issues
Globalised Cluster Storage Single I/O Space and Design Issues

Without Single I/O Space With Single I/O Space Services
Clusters with & without Single I/O Space Without Single I/O Space With Single I/O Space Services Users Users Single I/O Space Services

Benefits of Single I/O Space
Eliminate the gap between accessing local disk(s) and remote disks Support persistent programming paradigm Allow striping on remote disks, accelerate parallel I/O operations Facilitate the implementation of distributed checkpointing and recovery schemes

Single I/O Space Design Issues
Integrated I/O Space Addressing and Mapping Mechanisms Data movement procedures

Integrated I/O Space Local Disks, (RADD Space) Sequential addresses
. . . B11 SD1 SD2 SDm D11 D12 D1t D21 D22 D2t Dn1 Dn2 Dnt B12 B1k B21 B22 B2k Bm1 Bm2 Bmk LD1 LD2 LDn Local Disks, (RADD Space) Shared RAIDs, (NASD Space) P1 Ph Peripherals (NAP Space)

Addressing and Mapping
User Applications User-level Middleware plus some Modified OS System Calls Name Agent Disk/RAID/ NAP Mapper Block Mover I/O Agent I/O Agent I/O Agent I/O Agent RADD NASD NAP

Data Movement Procedures
Node 1 User Application Block Mover Node 2 Request Data Block A I/O Agent I/O Agent LD2 or SDi of the NASD A LD1 User Application Node 1 Node 2 Block Mover A I/O Agent I/O Agent LD2 or SDi of the NASD LD1 A

Adoption of the Approach

Summary Enabling Technologies Architecture & its Components
We have discussed Cluster Enabling Technologies Architecture & its Components Cluster Middleware Single System Image Representative SSI Tools

Conclusions Remarks Solve parallel processing paradox
Clusters are promising.. Solve parallel processing paradox Offer incremental growth and matches with funding pattern New trends in hardware and software technologies are likely to make clusters more promising and fill SSI gap..so that Clusters based supercomputers can be seen everywhere!

Breaking High Performance Computing Barriers
2100 G F L O P S 2100 Single Processor Shared Memory Local Parallel Cluster Global Parallel Cluster

Announcement: formation of
IEEE Task Force on Cluster Computing (TFCC)

Thank You ... ?

Pointers to Literature on Cluster Computing
Appendix Pointers to Literature on Cluster Computing 17

Reading Resources..1a Internet & WWW
Computer Architecture: PFS & Parallel I/O Linux Parallel Procesing DSMs

Reading Resources..1b Internet & WWW
Solaris-MC Microprocessors: Recent Advances Beowulf: MOSIX

Reading Resources..2 Papers
A Case of NOW, IEEE Micro, Feb’95 by Anderson, Culler, Paterson Designing SSI Clusters with Hierarchical Checkpointing and Single I/O Space”, IEEE Concurrency, March, 1999 by K. Hwang, H. Jin et.al Cluster Computing: The Commodity Supercomputing, Journal of Software Practice and Experience, June 1999 by Mark Baker & Rajkumar Buyya Implementing a Full Single System Image UnixWare Cluster: Middleware vs. Underware Bruce Walker and Douglas Steel (SCO NSC Unixware)

Reading Resources..3 Books
In Search of Cluster by G.Pfister, Prentice Hall (2ed), 98 High Performance Cluster Computing Volume1: Architectures and Systems Volume2: Programming and Applications Edited by Rajkumar Buyya, Prentice Hall, 1999. Scalable Parallel Computing by K Hwang & Z Xu, McGraw Hill,98

Architecture Alternatives for Scalable Single System Image Clusters

Similar presentations

Presentation on theme: "Architecture Alternatives for Scalable Single System Image Clusters"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Architecture Alternatives for Scalable Single System Image Clusters

Similar presentations

Presentation on theme: "Architecture Alternatives for Scalable Single System Image Clusters"— Presentation transcript:

Similar presentations

About project

Feedback