Low Cost Supercomputing

Slides:



Advertisements
Similar presentations
Client/Server Computing (the wave of the future) Rajkumar Buyya School of Computer Science & Software Engineering Monash University Melbourne, Australia.
Advertisements

PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya
Clustering Technology For Scaleability Jim Gray Microsoft Research
Clusters, Grids and their applications in Physics David Barnes (Astro) Lyle Winton (EPP)
Multiple Processor Systems
Distributed Processing, Client/Server and Clusters
Hardware & the Machine room Week 5 – Lecture 1. What is behind the wall plug for your workstation? Today we will look at the platform on which our Information.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Distributed Systems CS
Protocols and software for exploiting Myrinet clusters Congduc Pham and the main contributors P. Geoffray, L. Prylli, B. Tourancheau, R. Westrelin.
2. Computer Clusters for Scalable Parallel Computing
Beowulf Supercomputer System Lee, Jung won CS843.
Types of Parallel Computers
Distributed Processing, Client/Server, and Clusters
NPACI Panel on Clusters David E. Culler Computer Science Division University of California, Berkeley
High Performance Cluster Computing Architectures and Systems
OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”
Cluster Computing Slides by: Kale Law. Cluster Computing Definition Uses Advantages Design Types of Clusters Connection Types Physical Cluster Interconnects.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
07/14/08. 2 Points Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Chapter 2 Computer Clusters Lecture 2.1 Overview.
Infrastructure and Tools
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
CLUSTER COMPUTING Prepared by: Kalpesh Sindha (ITSNS)
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
1 Computing platform Andrew A. Chien Mohsen Saneei University of Tehran.
1 In Summary Need more computing power Improve the operating speed of processors & other components constrained by the speed of light, thermodynamic laws,
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
Computer System Architectures Computer System Software
1 Lecture 20: Parallel and Distributed Systems n Classification of parallel/distributed architectures n SMPs n Distributed systems n Clusters.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems.
CLUSTER COMPUTING STIMI K.O. ROLL NO:53 MCA B-5. INTRODUCTION  A computer cluster is a group of tightly coupled computers that work together closely.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
PARMON A Comprehensive Cluster Monitoring System A Single System Image Case Study Developer: PARMON Team Centre for Development of Advanced Computing,
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Cluster Workstations. Recently the distinction between parallel and distributed computers has become blurred with the advent of the network of workstations.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Types of Operating Systems
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
CLUSTER COMPUTING TECHNOLOGY BY-1.SACHIN YADAV 2.MADHAV SHINDE SECTION-3.
Types of Operating Systems 1 Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
Distributed Systems Unit – 1 Concepts of DS By :- Maulik V. Dhamecha Maulik V. Dhamecha (M.Tech.)
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-3. Clusters Classification Application Target ● High Performance (HP) Clusters ➢ Grand Challenging Applications.
Tackling I/O Issues 1 David Race 16 March 2010.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.
Background Computer System Architectures Computer System Software.
High Performance Cluster Computing Architectures and Systems Book Editor: Rajkumar Buyya Slides Prepared by: Hai Jin Internet and Cluster Computing Center.
SYSTEM MODELS FOR ADVANCED COMPUTING Jhashuva. U 1 Asst. Prof CSE
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
OpenMosix, Open SSI, and LinuxPMI
Berkeley Cluster Projects
CLUSTER COMPUTING Presented By, Navaneeth.C.Mouly 1AY05IS037
Grid Computing.
Constructing a system with multiple computers or processors
University of Technology
Architecture Alternatives for Scalable Single System Image Clusters
What is Parallel and Distributed computing?
Single System Image and Cluster Middleware
CLUSTER COMPUTING.
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Types of Parallel Computers
Cluster Computers.
Presentation transcript:

Low Cost Supercomputing No Parallel Processing on Linux Clusters Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org http://www.dgs.monash.edu.au/~rajkumar

Agenda Cluster ? Enabling Tech. & Motivations Cluster Architecture Cluster Components and Linux Parallel Processing Tools on Linux Cluster Facts Resources and Conclusions

Need of more Computing Power: Grand Challenge Applications Solving technology problems using computer modeling, simulation and analysis Geographic Information Systems Life Sciences Aerospace Mechanical Design & Analysis (CAD/CAM)

Two Eras of Computing Architectures System Software Applications P.S.Es Sequential Era Parallel Era 1940 50 60 70 80 90 2000 2030 Commercialization R & D Commodity

Competing Computer Architectures Vector Computers (VC) ---proprietary system provided the breakthrough needed for the emergence of computational science, buy they were only a partial answer. Massively Parallel Processors (MPP)-proprietary system high cost and a low performance/price ratio. Symmetric Multiprocessors (SMP) suffers from scalability Distributed Systems difficult to use and hard to extract parallel performance. Clusters -- gaining popularity High Performance Computing---Commodity Supercomputing High Availability Computing ---Mission Critical Applications

Technology Trend... Performance of PC/Workstations components has almost reached performance of those used in supercomputers… Microprocessors (50% to 100% per year) Networks (Gigabit ..) Operating Systems Programming environment Applications Rate of performance improvements of commodity components is too high.

Technology Trend

The Need for Alternative Supercomputing Resources Cannot afford to buy “Big Iron” machines due to their high cost and short life span. cut-down of funding don’t “fit” better into today's funding model. …. Paradox: time required to develop a parallel application for solving GCA is equal to: half Life of Parallel Supercomputers.

Clusters are best-alternative! Supercomputing-class commodity components are available They “fit” very well with today’s/future funding model. Can leverage upon future technological advances VLSI, CPUs, Networks, Disk, Memory, Cache, OS, programming tools, applications,...

parallel computers/supercomputer-class workstation cluster Best of both Worlds! High Performance Computing (talk focused on this) parallel computers/supercomputer-class workstation cluster dependable parallel computers High Availability Computing mission-critical systems fault-tolerant computing

What is a cluster? A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers cooperatively working together as a single, integrated computing resource. A typical cluster: Network: Faster, closer connection than a typical network (LAN) Low latency communication protocols Looser connection than SMP

So What’s So Different about Clusters? Commodity Parts? Communications Packaging? Incremental Scalability? Independent Failure? Intelligent Network Interfaces? Complete System on every node virtual memory scheduler files … Nodes can be used individually or combined...

Clustering of Computers for Collective Computating 1990 1995+ 1960

Computer Food Chain (Now and Future) Demise of Mainframes, Supercomputers, & MPPs 2

Cluster Configuration..1 Dedicated Cluster 3

Cluster Configuration..2 Enterprise Clusters (use JMS like Codine) Shared Pool of Computing Resources: Processors, Memory, Disks Interconnect Guarantee at least one workstation to many individuals (when active) Deliver large % of collective resources to few individuals at any one time 3

Windows of Opportunities MPP/DSM: Compute across multiple systems: parallel. Network RAM: Idle memory in other nodes. Page across other nodes idle memory Software RAID: file system supporting parallel I/O and reliability, mass-storage. Multi-path Communication: Communicate across multiple networks: Ethernet, ATM, Myrinet

Cluster Computer Architecture

Major issues in cluster design Size Scalability (physical & application) Enhanced Availability (failure management) Single System Image (look-and-feel of one system) Fast Communication (networks & protocols) Load Balancing (CPU, Net, Memory, Disk) Security and Encryption (clusters of clusters) Distributed Environment (Social issues) Manageability (admin. And control) Programmability (simple API if required) Applicability (cluster-aware and non-aware app.)

Scalability Vs. Single System Image UP

Linux-based Tools for High Availability Computing High Performance Computing

Hardware Linux OS is running/driving... Linux supports networking with PCs (Intel x86 processors) Workstations (Digital Alphas) SMPs (CLUMPS) Clusters of Clusters Linux supports networking with Ethernet (10Mbps)/Fast Ethernet (100Mbps), Gigabit Ethernet (1Gbps) SCI (Dolphin - MPI- 12micro-sec latency) ATM Myrinet (1.2Gbps) Digital Memory Channel FDDI

Communication Software Traditional OS supported facilities (heavy weight due to protocol processing).. Sockets (TCP/IP), Pipes, etc. Light weight protocols (User Level) Active Messages (AM) (Berkeley) Fast Messages (Illinois) U-net (Cornell) XTP (Virginia) Virtual Interface Architecture (industry standard)

Single System Image (SSI) System Availability (SA) Cluster Middleware Resides Between OS and Applications and offers in infrastructure for supporting: Single System Image (SSI) System Availability (SA) SSI makes collection appear as single machine (globalised view of system resources). telnet cluster.myinstitute.edu SA - Check pointing and process migration..

Cluster Middleware OS / Gluing Layers Runtime Systems Solaris MC, Unixware, MOSIX Beowulf “Distributed PID” Runtime Systems Runtime systems (software DSM, PFS, etc.) Resource management and scheduling (RMS): CODINE, CONDOR, LSF, PBS, NQS, etc.

Programming environments Threads (PCs, SMPs, NOW..) POSIX Threads Java Threads MPI http://www-unix.mcs.anl.gov/mpi/mpich/ PVM http://www.epm.ornl.gov/pvm/ Software DSMs (Shmem)

Development Tools Compilers Debuggers Performance Analysis Tools GNU-- www.gnu.org Compilers C/C++/Java/ Debuggers Performance Analysis Tools Visualization Tools

Applications Sequential (benefit from the cluster) Parallel / Distributed (Cluster-aware app.) Grand Challenging applications Weather Forecasting Quantum Chemistry Molecular Biology Modeling Engineering Analysis (CAD/CAM) Ocean Modeling ………… PDBs, web servers,data-mining

Linux Webserver (Network Load Balancing) http://proxy.iinchina.net/~wensong/ippfvs/ High Performance (by serving through light loaded machine) High Availability (detecting failed nodes and isolating them from the cluster) Transparent/Single System view

A typical Cluster Computing Environment Application PVM / MPI/ RSH ??? Hardware/OS

CC should support Multi-user, time-sharing environments Nodes with different CPU speeds and memory sizes (heterogeneous configuration) Many processes, with unpredictable requirements Unlike SMP: insufficient “bonds” between nodes Each computer operates independently Inefficient utilization of resources

Multicomputer OS for UNIX (MOSIX) http://www.mosix.cs.huji.ac.il/ An OS module (layer) that provides the applications with the illusion of working on a single system Remote operations are performed like local operations Transparent to the application - user interface unchanged Application PVM / MPI / RSH MOSIX Offers missing link Hardware/OS

MOSIX is Main tool Preemptive process migration that can migrate--->any process, anywhere, anytime Supervised by distributed algorithms that respond on-line to global resource availability - transparently Load-balancing - migrate process from over-loaded to under-loaded nodes Memory ushering - migrate processes from a node that has exhausted its memory, to prevent paging/swapping

MOSIX for Linux at HUJI 50 Pentium-II 300 MHz A scalable cluster configuration: 50 Pentium-II 300 MHz 38 Pentium-Pro 200 MHz (some are SMPs) 16 Pentium-II 400 MHz (some are SMPs) Over 12 GB cluster-wide RAM Connected by the Myrinet 2.56 G.b/s LAN Runs Red-Hat 6.0, based on Kernel 2.2.7 Upgrade: HW with Intel, SW with Linux Download MOSIX: http://www.mosix.cs.huji.ac.il/

Nimrod - A tool for parametric modeling on clusters http://www.dgs.monash.edu.au/~davida/nimrod.html

Job processing with Nimrod

PARMON: A Cluster Monitoring Tool PARMON Server on each node PARMON Client on JVM PARMON High-Speed Switch parmond parmon 4

Resource Utilization at a Glance

Linux cluster in Top500 Top500 Supercomputing (www.top500.org) Sites declared Avalon(http://cnls.lanl.gov/avalon/), Beowulf cluster, the 113th most powerful computer in the world. 70 processor DEC Alpha cluster Cost: $152K Completely commodity and Free Software price/performance is $15/Mflop, performance similar to 1993’s 1024-node CM-5

Adoption of the Approach

Conclusions Remarks Solve parallel processing paradox Clusters are promising.. Solve parallel processing paradox Offer incremental growth and matches with funding pattern New trends in hardware and software technologies are likely to make clusters more promising and fill SSI gap..so that Clusters based supercomputers (Linux based clusters) can be seen everywhere!

Announcement: formation of IEEE Task Force on Cluster Computing (TFCC) http://www.dgs.monash.edu.au/~rajkumar/tfcc/ http://www.dcs.port.ac.uk/~mab/tfcc/

? Well, Read my book for…. Thank You ... http://www.dgs.monash.edu.au/~rajkumar/cluster/