IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

Slides:



Advertisements
Similar presentations
Threads, SMP, and Microkernels
Advertisements

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Supporting Parallel Applications on Clusters of Workstations: The Intelligent Network Interface Approach.
Introduction CSCI 444/544 Operating Systems Fall 2008.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Distributed systems Programming with threads. Reviews on OS concepts Each process occupies a single address space.
Distributed systems Programming with threads. Reviews on OS concepts Each process occupies a single address space.
Extensibility, Safety and Performance in the SPIN Operating System Brian Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
DISTRIBUTED CONSISTENCY MANAGEMENT IN A SINGLE ADDRESS SPACE DISTRIBUTED OPERATING SYSTEM Sombrero.
NOW 1 Berkeley NOW Project David E. Culler Sun Visit May 1, 1998.
6/28/98SPAA/PODC1 High-Performance Clusters part 2: Generality David E. Culler Computer Science Division U.C. Berkeley PODC/SPAA Tutorial Sunday, June.
Haoyuan Li CS 6410 Fall /15/2009.  U-Net: A User-Level Network Interface for Parallel and Distributed Computing ◦ Thorsten von Eicken, Anindya.
Introduction  What is an Operating System  What Operating Systems Do  How is it filling our life 1-1 Lecture 1.
MS 9/19/97 implicit coord 1 Implicit Coordination in Clusters David E. Culler Andrea Arpaci-Dusseau Computer Science Division U.C. Berkeley.
IPPS 981 Berkeley FY98 Resource Working Group David E. Culler Computer Science Division U.C. Berkeley
Figure 1.1 Interaction between applications and the operating system.
TITAN: A Next-Generation Infrastructure for Integrating and Communication David E. Culler Computer Science Division U.C. Berkeley NSF Research Infrastructure.
Lecture 1: Introduction CS170 Spring 2015 Chapter 1, the text book. T. Yang.
Introduction Operating Systems’ Concepts and Structure Lecture 1 ~ Spring, 2008 ~ Spring, 2008TUCN. Operating Systems. Lecture 1.
Threads CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
PRASHANTHI NARAYAN NETTEM.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Introduction to client/server architecture
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Computer System Architectures Computer System Software
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
9/13/20151 Threads ICS 240: Operating Systems –William Albritton Information and Computer Sciences Department at Leeward Community College –Original slides.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Fall 2000M.B. Ibáñez Lecture 01 Introduction What is an Operating System? The Evolution of Operating Systems Course Outline.
◦ What is an Operating System? What is an Operating System? ◦ Operating System Objectives Operating System Objectives ◦ Services Provided by the Operating.
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems CSCI-6140 – Computer Operating Systems David Goldschmidt, Ph.D.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Lecture 9: Memory Hierarchy Virtual Memory Kai Bu
The Vesta Parallel File System Peter F. Corbett Dror G. Feithlson.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
Processes Introduction to Operating Systems: Module 3.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.
Introduction to Operating Systems and Concurrency.
Definition of a Distributed System (1) A distributed system is: A collection of independent computers that appears to its users as a single coherent system.
6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek Robert Morris
Lecture 4 Page 1 CS 111 Online Modularity and Virtualization CS 111 On-Line MS Program Operating Systems Peter Reiher.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
Parallel IO for Cluster Computing Tran, Van Hoai.
Concepts and Structures. Main difficulties with OS design synchronization ensure a program waiting for an I/O device receives the signal mutual exclusion.
Major OS Components CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
Background Computer System Architectures Computer System Software.
Intro to Distributed Systems Hank Levy. 23/20/2016 Distributed Systems Nearly all systems today are distributed in some way, e.g.: –they use –they.
Threads, SMP, and Microkernels Chapter 4. Processes and Threads Operating systems use processes for two purposes - Resource allocation and resource ownership.
CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
CS 162 Discussion Section Week 2. Who am I? Prashanth Mohan Office Hours: 11-12pm Tu W at.
Introduction to Operating Systems Concepts
Introduction to threads
Modularity Most useful abstractions an OS wants to offer can’t be directly realized by hardware Modularity is one technique the OS uses to provide better.
Berkeley Cluster Projects
CS 6560: Operating Systems Design
Definition of Distributed System
Chapter 1: Introduction
Advanced Operating Systems
CMSC 611: Advanced Computer Architecture
Multithreaded Programming
Operating Systems (CS 340 D)
Prof. Leonardo Mostarda University of Camerino
Presentation transcript:

IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 982 High Performance Clusters “happen” Many groups have built them. Many more are using them. Industry is running with it –Virtual Interface Architecture –System Area Networks A powerful, flexible new design technique

IPPS 983 Outline Quick “guided tour” of Clusters at Berkeley Three Important Advances => Virtual Networks Alan Mainwaring => Implicit Co-scheduling Andrea Arpaci-Dusseau => Scalable I/O Remzi Arpaci-Dusseau What it means

IPPS 984 Stop 1: HP/fddi Prototype FDDI on the HP/735 graphics bus. First fast msg layer on non-reliable network

IPPS 985 Stop 2: SparcStation NOW ATM was going to take over the world. The original INKTOMI

IPPS 986 Stop 3: Large Ultra/Myrinet NOW

IPPS 987 Stop 4: Massive Cheap Storage Basic unit: 2 PCs double-ending four SCSI chains Currently serving Fine Art at

IPPS 988 Stop 5: Cluster of SMPs (CLUMPS) Four Sun E5000s –8 processors –3 Myricom NICs Multiprocessor, Multi- NIC, Multi-Protocol –see S. Lumetta IPPS98

IPPS 989 Stop 6: Information Servers Basic Storage Unit: – Ultra 2, 300 GB raid, 800 GB tape stacker, ATM –scalable backup/restore Dedicated Info Servers –web, –security, –mail, … VLANs project into dept.

IPPS 9810 Stop 7: Millennium PC Clumps Inexpensive, easy to manage Cluster Replicated in many departments Prototype for very large PC cluster

IPPS 9811 So What’s So Different? Commodity parts? Communications Packaging? Incremental Scalability? Independent Failure? Intelligent Network Interfaces? Complete System on every node –virtual memory –scheduler –files –...

IPPS 9812 Three important system design aspects Virtual Networks Implicit co-scheduling Scalable File Transfer

IPPS 9813 Communication Performance  Direct Network Access LogP: Latency, Overhead, and Bandwidth Active Messages: lean layer supporting programming models Latency1/BW

IPPS 9814 General purpose requirements Many timeshared processes –each with direct, protected access User and system Client/Server, Parallel clients, parallel servers –they grow, shrink, handle node failures Multiple packages in a process –each may have own internal communication layer Use communication as easily as memory

IPPS 9815 Virtual Networks Endpoint abstracts the notion of “attached to the network” Virtual network is a collection of endpoints that can name each other. Many processes on a node can each have many endpoints, each with own protection domain.

IPPS 9816 Process 3 How are they managed? How do you get direct hardware access for performance with a large space of logical resources? Just like virtual memory –active portion of large logical space is bound to physical resources Process n Process 2 Process 1 *** Host Memory Processor NIC Mem Network Interface P

IPPS 9817 Endpoint Transition Diagram COLD Paged Host Memory WARM R/O Paged Host Memory HOT R/W NIC Memory Read Evict Swap Write Msg Arrival

IPPS 9818 Network Interface Support NIC has endpoint frames Services active endpoints Signals misses to driver –using a system endpont Frame 0 Frame 7 Transmit Receive EndPoint Miss

IPPS 9819 Solaris System Abstractions Segment Driver manages portions of an address space Device Driver manages I/O device Virtual Network Driver

IPPS 9820 LogP Performance Competitive latency Increased NIC processing Difference mostly –ack processing –protection check –data structures –code quality Virtualization cheap

IPPS 9821 Bursty Communication among many Client Server Msg burst work

IPPS 9822 Multiple VN’s, Single-thread Server

IPPS 9823 Multiple VNs, Multithreaded Server

IPPS 9824 Perspective on Virtual Networks Networking abstractions are vertical stacks –new function => new layer –poke through for performance Virtual Networks provide a horizontal abstraction –basis for build new, fast services

IPPS 9825 Beyond the Personal Supercomputer Able to timeshare parallel programs –with fast, protected communication Mix with sequential and interactive jobs Use fast communication in OS subsystems –parallel file system, network virtual memory, … Nodes have powerful, local OS scheduler Problem: local schedulers do not know to run parallel jobs in parallel

IPPS 9826 Local Scheduling Schedulers act independently w/o global control Program waits while trying communicate with its peers that are not running x slowdowns for fine-grain programs! => need coordinated scheduling

IPPS 9827 Explicit Coscheduling Global context switch according to precomputed schedule How do you build it? Does it work?

IPPS 9828 Typical Cluster Subsystem Structures A LS AA A A A Master A LS A GS A LS GS A LS A GS LS A GS Local service Applications Communication Global Service Communication Master-Slave Peer-to-Peer

IPPS 9829 Ideal Cluster Subsystem Structure Obtain coordination without explicit subsystem interaction, only the events in the program –very easy to build –potentially very robust to component failures –inherently “service on-demand” –scalable Local service component can evolve. A LS A GS A LS GS A LS A GS LS A GS

IPPS 9830 Three approaches examined in NOW GLUNIX explicit master-slave (user level) –matrix algorithm to pick PP –uses stops & signals to try to force desired PP to run Explicit peer-peer scheduling assist with VNs –co-scheduling daemons decide on PP and kick the solaris scheduler Implicit –modify the parallel run-time library to allow it to get itself co- scheduled with standard scheduler A LS AA A A A M A A GS A LS GS A LS A GS LS A GS A LS A GS A LS GS A LS A GS LS A GS

IPPS 9831 Problems with explicit coscheduling Implementation complexity Need to identify parallel programs in advance Interacts poorly with interactive use and load imbalance Introduces new potential faults Scalability

IPPS 9832 Why implicit coscheduling might work Active message request-reply model Infer non-local state from local observations; react to maintain coordination observationimplication action fast response partner scheduledspin delayed response partner not scheduledblock WS 1 Job A WS 2 Job BJob A WS 3 Job BJob A WS 4 Job BJob A sleep spin requestresponse

IPPS 9833 Obvious Questions Does it work? How long do you spin? What are the requirements on the local scheduler?

IPPS 9834 How Long to Spin? Answer: round trip time + 5 x wake-up time –round-trip to stay scheduled together –plus wake-up to get scheduled together –plus wake-up to be competitive with blocking cost –plus 3 x wake-up to meet “pairwise” cost

IPPS 9835 Does it work?

IPPS 9836 Synthetic Bulk-synchronous Apps Range of granularity and load imbalance –spin wait 10x slowdown

IPPS 9837 With mixture of reads Block-immediate 4x slowdown

IPPS 9838 Timesharing Split-C Programs

IPPS 9839 Many Questions What about –mix of jobs? –sequential jobs? –unbalanced placement? –Fairness? –Scalability? How broadly can implicit coordination be applied in the design of cluster subsystems?

IPPS 9840 A look at Serious File I/O Traditional I/O system NOW I/O system Benchmark Problem: sort large number of 100 byte records with 10 byte keys –start on disk, end on disk –accessible as files (use the file system) –Datamation sort: 1 million records –Minute sort: quantity in a minute Proc- Mem P-M

IPPS 9841 NOW-Sort Algorithm: 1 pass Read –N/P records from disk -> memory Distribute –send keys to processors holding result buckets Sort –partial radix sort on each bucket Write –gather and write records to disk

IPPS 9842 Key Implementation Techniques Performance Isolation: highly tuned local disk- to-disk sort –manage local memory –manage disk striping –memory mapped I/O with m-advise, buffering –manage overlap with threads Efficient Communication –completely hidden under disk I/O –competes for I/O bus bandwidth Self-tuning Software –probe available memory, disk bandwidth, trade-offs

IPPS 9843 World-Record Disk-to-Disk Sort Sustain 500 MB/s disk bandwidth and 1,000 MB/s network bandwidth

IPPS 9844 Towards a Cluster File System Remote disk system built on a virtual network Client RDlib RD server Active msgs

IPPS 9845 Streaming Transfer Experiment

IPPS 9846 Results Data distribution affects resource utilization Not delivered bandwidth

IPPS 9847 I/O Bus crossings

IPPS 9848 Conclusions Complete system on every node makes clusters a very powerful architecture. Extend the system globally –virtual memory systems, –schedulers, –file systems,... Efficient communication enables new solutions to classic systems challenges. Opens a rich set of issues for parallel processing beyond the personal supercomputer.