NOW 1 Berkeley NOW Project David E. Culler Sun Visit May 1, 1998.

Slides:



Advertisements
Similar presentations
Data Storage Solutions Module 1.2. Data Storage Solutions Upon completion of this module, you will be able to: List the common storage media and solutions.
Advertisements

Operating System.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
High-Performance Clusters part 1: Performance David E. Culler Computer Science Division U.C. Berkeley PODC/SPAA Tutorial Sunday, June 28, 1998.
Chapter 7 LAN Operating Systems LAN Software Software Compatibility Network Operating System (NOP) Architecture NOP Functions NOP Trends.
Introduction CSCI 444/544 Operating Systems Fall 2008.
Unique Opportunities in Experimental Computer Systems Research - the Berkeley Testbeds David Culler U.C. Berkeley Grad.
Distributed Processing, Client/Server, and Clusters
CS 213 Commercial Multiprocessors. Origin2000 System – Shared Memory Directory state in same or separate DRAMs, accessed in parallel Upto 512 nodes (1024.
NPACI Panel on Clusters David E. Culler Computer Science Division University of California, Berkeley
Towards I-Space Ninja Mini-Retreat June 11, 1997 David Culler, Steve Gribble, Mark Stemm, Matt Welsh Computer Science Division U.C. Berkeley.
DISTRIBUTED CONSISTENCY MANAGEMENT IN A SINGLE ADDRESS SPACE DISTRIBUTED OPERATING SYSTEM Sombrero.
IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley
CS 550 Amoeba-A Distributed Operation System by Saie M Mulay.
6/28/98SPAA/PODC1 High-Performance Clusters part 2: Generality David E. Culler Computer Science Division U.C. Berkeley PODC/SPAA Tutorial Sunday, June.
Haoyuan Li CS 6410 Fall /15/2009.  U-Net: A User-Level Network Interface for Parallel and Distributed Computing ◦ Thorsten von Eicken, Anindya.
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
NOW and Beyond Workshop on Clusters and Computational Grids for Scientific Computing David E. Culler Computer Science Division Univ. of California, Berkeley.
Introduction  What is an Operating System  What Operating Systems Do  How is it filling our life 1-1 Lecture 1.
MS 9/19/97 implicit coord 1 Implicit Coordination in Clusters David E. Culler Andrea Arpaci-Dusseau Computer Science Division U.C. Berkeley.
IPPS 981 Berkeley FY98 Resource Working Group David E. Culler Computer Science Division U.C. Berkeley
Figure 1.1 Interaction between applications and the operating system.
TITAN: A Next-Generation Infrastructure for Integrating and Communication David E. Culler Computer Science Division U.C. Berkeley NSF Research Infrastructure.
Parallel Processing Architectures Laxmi Narayan Bhuyan
Lecture 1: Introduction CS170 Spring 2015 Chapter 1, the text book. T. Yang.
Introduction Operating Systems’ Concepts and Structure Lecture 1 ~ Spring, 2008 ~ Spring, 2008TUCN. Operating Systems. Lecture 1.
PRASHANTHI NARAYAN NETTEM.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Module – 7 network-attached storage (NAS)
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
CLUSTER COMPUTING Prepared by: Kalpesh Sindha (ITSNS)
1 In Summary Need more computing power Improve the operating speed of processors & other components constrained by the speed of light, thermodynamic laws,
Computer System Architectures Computer System Software
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
CLUSTER COMPUTING STIMI K.O. ROLL NO:53 MCA B-5. INTRODUCTION  A computer cluster is a group of tightly coupled computers that work together closely.
◦ What is an Operating System? What is an Operating System? ◦ Operating System Objectives Operating System Objectives ◦ Services Provided by the Operating.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Background: Operating Systems Brad Karp UCL Computer Science CS GZ03 / M th November, 2008.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
CLUSTER COMPUTING TECHNOLOGY BY-1.SACHIN YADAV 2.MADHAV SHINDE SECTION-3.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek Robert Morris
Lecture 4 Page 1 CS 111 Online Modularity and Virtualization CS 111 On-Line MS Program Operating Systems Peter Reiher.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Major OS Components CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
Intro to Distributed Systems Hank Levy. 23/20/2016 Distributed Systems Nearly all systems today are distributed in some way, e.g.: –they use –they.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
SYSTEM MODELS FOR ADVANCED COMPUTING Jhashuva. U 1 Asst. Prof CSE
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
CS 162 Discussion Section Week 2. Who am I? Prashanth Mohan Office Hours: 11-12pm Tu W at.
Introduction to Operating Systems Concepts
Modularity Most useful abstractions an OS wants to offer can’t be directly realized by hardware Modularity is one technique the OS uses to provide better.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Berkeley Cluster Projects
Parallel Computers Definition: “A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast.”
Storage Networking.
Storage Virtualization
Storage Networking.
Parallel Processing Architectures
Chapter-1 Computer is an advanced electronic device that takes raw data as an input from the user and processes it under the control of a set of instructions.
Presentation transcript:

NOW 1 Berkeley NOW Project David E. Culler Sun Visit May 1, 1998

NOW 2 Project Goals Make a fundamental change in how we design and construct large-scale systems –market reality: »50%/year performance growth => cannot allow 1-2 year engineering lag –technological opportunity: »single-chip “Killer Switch” => fast, scalable communication Highly integrated building-wide system Explore novel system design concepts in this new “cluster” paradigm

NOW 3 Remember the “Killer Micro” Technology change in all markets At many levels: Arch, Compiler, OS, Application Linpack Peak Performance

NOW 4 Another Technological Revolution The “Killer Switch” –single chip building block for scalable networks –high bandwidth –low latency –very reliable »if it’s not unplugged => System Area Networks

NOW 5 One Example: Myrinet 8 bidirectional ports of 160 MB/s each way < 500 ns routing delay Simple - just moves the bits Detects connectivity and deadlock Tomorrow: gigabit Ethernet?

NOW 6 Potential: Snap together large systems incremental scalability time / cost to market independent failure => availability Engineering Lag Time Node Performance in Large System

NOW 7 Opportunity: Rethink O.S. Design Remote memory and processor are closer to you than your own disks! Networking Stacks ? Virtual Memory ? File system design ?

NOW 8 Example: Traditional File System Clients Server $$$ Global Shared File Cache RAID Disk Storage Fast Channel (HPPI) Expensive Complex Non-Scalable Single point of failure $ Local Private File Cache $$ ° ° ° Bottleneck Server resources at a premium Client resources poorly utilized

NOW 9 Truly Distributed File System VM: page to remote memory File Cache P File Cache P File Cache P File Cache P File Cache P File Cache P File Cache P File Cache P Scalable Low-Latency Communication Network Network RAID striping G = Node Comm BW / Disk BW Local Cache Cluster Caching

NOW 10 Fast Communication Challenge Fast processors and fast networks The time is spent in crossing between them Killer Switch ° ° ° Network Interface Hardware Comm.. Software Network Interface Hardware Comm. Software Network Interface Hardware Comm. Software Network Interface Hardware Comm. Software Killer Platform ns µs ms

NOW 11 Opening: Intelligent Network Interfaces Dedicated Processing power and storage embedded in the Network Interface An I/O card today Tomorrow on chip? $ P M I/O bus (S-Bus) 50 MB/s Mryicom Net P Sun Ultra 170 Myricom NIC 160 MB/s M $ P M P $ P M $ P $ P M

NOW 12 Our Attack: Active Messages Request / Reply small active messages (RPC) Bulk-Transfer (store & get) Highly optimized communication layer on a range of HW Request handler Reply

NOW 13 NOW System Architecture Net Inter. HW UNIX Workstation Comm. SW Net Inter. HW Comm. SW Net Inter. HW Comm. SW Net Inter. HW Comm. SW Global Layer UNIX Resource Management Network RAM Distributed Files Process Migration Fast Commercial Switch (Myrinet) UNIX Workstation UNIX Workstation UNIX Workstation Large Seq. Apps Parallel Apps Sockets, Split-C, MPI, HPF, vSM

NOW 14 Outline Introduction to the NOW project Quick tour of the NOW lab Important new system design concepts Conclusions Future Directions

NOW 15 First HP/fddi Prototype FDDI on the HP/735 graphics bus. First fast msg layer on non-reliable network

NOW 16 SparcStation ATM NOW ATM was going to take over the world. The original INKTOMI Today:

NOW node Ultra/Myrinet NOW

NOW 18 Massive Cheap Storage Basic unit: 2 PCs double-ending four SCSI chains Currently serving Fine Art at

NOW 19 Cluster of SMPs (CLUMPS) Four Sun E5000s –8 processors –3 Myricom NICs Multiprocessor, Multi- NIC, Multi-Protocol

NOW 20 Information Servers Basic Storage Unit: – Ultra 2, 300 GB raid, 800 GB tape stacker, ATM –scalable backup/restore Dedicated Info Servers –web, –security, –mail, … VLANs project into dept.

NOW 21 What’s Different about Clusters? Commodity parts? Communications Packaging? Incremental Scalability? Independent Failure? Intelligent Network Interfaces? Complete System on every node –virtual memory –scheduler –files –...

NOW 22 Three important system design aspects Virtual Networks Implicit co-scheduling Scalable File Transfer

NOW 23 Communication Performance  Direct Network Access LogP: Latency, Overhead, and Bandwidth Active Messages: lean layer supporting programming models Latency1/BW

NOW 24 Example: NAS Parallel Benchmarks Better node performance than the Cray T3D Better scalability than the IBM SP-2

NOW 25 General purpose requirements Many timeshared processes –each with direct, protected access User and system Client/Server, Parallel clients, parallel servers –they grow, shrink, handle node failures Multiple packages in a process –each may have own internal communication layer Use communication as easily as memory

NOW 26 Virtual Networks Endpoint abstracts the notion of “attached to the network” Virtual network is a collection of endpoints that can name each other. Many processes on a node can each have many endpoints, each with own protection domain.

NOW 27 Process 3 How are they managed? How do you get direct hardware access for performance with a large space of logical resources? Just like virtual memory –active portion of large logical space is bound to physical resources Process n Process 2 Process 1 *** Host Memory Processor NIC Mem Network Interface P

NOW 28 Solaris System Abstractions Segment Driver manages portions of an address space Device Driver manages I/O device Virtual Network Driver

NOW 29 Virtualization is not expensive

NOW 30 Bursty Communication among many virtual networks Client Server Msg burst work

NOW 31 Sustain high BW with many VN

NOW 32 Perspective on Virtual Networks Networking abstractions are vertical stacks –new function => new layer –poke through for performance Virtual Networks provide a horizontal abstraction –basis for build new, fast services

NOW 33 Beyond the Personal Supercomputer Able to timeshare parallel programs –with fast, protected communication Mix with sequential and interactive jobs Use fast communication in OS subsystems –parallel file system, network virtual memory, … Nodes have powerful, local OS scheduler Problem: local schedulers do not know to run parallel jobs in parallel

NOW 34 Local Scheduling Local Schedulers act independently –no global control Program waits while trying communicate with peers that are not running x slowdowns for fine-grain programs! => need coordinated scheduling

NOW 35 Traditional Solution: Gang Scheduling Global context switch according to precomputed schedule Inflexible, inefficient, fault prone

NOW 36 Novel Solution: Implicit Coscheduling Coordinate schedulers using only the communication in the program –very easy to build –potentially very robust to component failures –inherently “service on-demand” –scalable Local service component can evolve. A LS A GS A LS GS A LS A GS LS A GS

NOW 37 Why it works Infer non-local state from local observations React to maintain coordination observationimplication action fast response partner scheduledspin delayed response partner not scheduledblock WS 1 Job A WS 2 Job BJob A WS 3 Job BJob A WS 4 Job BJob A sleep spin requestresponse

NOW 38 Example: Synthetic Pgms Range of granularity and load imbalance –spin wait 10x slowdown

NOW 39 Implicit Coordination Surprisingly effective –real programs –range of workloads –simple an robust Opens many new research questions –fairness How broadly can implicit coordination be applied in the design of cluster subsystems?

NOW 40 A look at Serious File I/O Traditional I/O system NOW I/O system Benchmark Problem: sort large number of 100 byte records with 10 byte keys –start on disk, end on disk –accessible as files (use the file system) –Datamation sort: 1 million records –Minute sort: quantity in a minute Proc- Mem P-M

NOW 41 World-Record Disk-to-Disk Sort Sustain 500 MB/s disk bandwidth and 1,000 MB/s network bandwidth

NOW 42 Key Implementation Techniques Performance Isolation: highly tuned local disk-to-disk sort –manage local memory –manage disk striping –memory mapped I/O with m-advise, buffering –manage overlap with threads Efficient Communication –completely hidden under disk I/O –competes for I/O bus bandwidth Self-tuning Software –probe available memory, disk bandwidth, trade-offs

NOW 43 Towards a Cluster File System Remote disk system built on a virtual network Client RDlib RD server Active msgs

NOW 44 Conclusions Fast, simple Cluster Area Networks are a technological breakthrough Complete system on every node makes clusters a very powerful architecture. Extend the system globally –virtual memory systems, –schedulers, –file systems,... Efficient communication enables new solutions to classic systems challenges.

NOW 45 Millennium Computational Community Gigabit Ethernet SIMS C.S. E.E. M.E. BMRC N.E. IEOR C. E. MSME NERSC Transport Business Chemistry Astro Physics Biology Economy Math

NOW 46 Millennium PC Clumps Inexpensive, easy to manage Cluster Replicated in many departments Prototype for very large PC cluster

NOW 47 Proactive Infrastructure Scalable Servers Stationary desktops Information appliances