TITAN: A Next-Generation Infrastructure for Integrating and Communication David E. Culler Computer Science Division U.C. Berkeley NSF Research Infrastructure.

Slides:



Advertisements
Similar presentations
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Advertisements

High-Performance Clusters part 1: Performance David E. Culler Computer Science Division U.C. Berkeley PODC/SPAA Tutorial Sunday, June 28, 1998.
2. Computer Clusters for Scalable Parallel Computing
Chapter 7 LAN Operating Systems LAN Software Software Compatibility Network Operating System (NOP) Architecture NOP Functions NOP Trends.
Unique Opportunities in Experimental Computer Systems Research - the Berkeley Testbeds David Culler U.C. Berkeley Grad.
NPACI Panel on Clusters David E. Culler Computer Science Division University of California, Berkeley
Network Components and Equipment Organizational Communications and Technologies Prithvi N. Rao H. John Heinz III School of Public Policy and Management.
IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley
NOW 1 Berkeley NOW Project David E. Culler Sun Visit May 1, 1998.
6/28/98SPAA/PODC1 High-Performance Clusters part 2: Generality David E. Culler Computer Science Division U.C. Berkeley PODC/SPAA Tutorial Sunday, June.
NOW and Beyond Workshop on Clusters and Computational Grids for Scientific Computing David E. Culler Computer Science Division Univ. of California, Berkeley.
MS 9/19/97 implicit coord 1 Implicit Coordination in Clusters David E. Culler Andrea Arpaci-Dusseau Computer Science Division U.C. Berkeley.
IPPS 981 Berkeley FY98 Resource Working Group David E. Culler Computer Science Division U.C. Berkeley
Distributed Information Systems - The Client server model
Lecture 1: Introduction CS170 Spring 2015 Chapter 1, the text book. T. Yang.
PRASHANTHI NARAYAN NETTEM.
Packing for the Expedition David Culler. 5/25/992 Ongoing Endeavors Millennium: building a large distributed experimental testbed –Berkeley Cluster Software.
COM S 614 Advanced Systems Novel Communications U-Net and Active Messages.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Module – 7 network-attached storage (NAS)
Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Protocols and the TCP/IP Suite Chapter 4. Multilayer communication. A series of layers, each built upon the one below it. The purpose of each layer is.
Edge Based Cloud Computing as a Feasible Network Paradigm(1/27) Edge-Based Cloud Computing as a Feasible Network Paradigm Joe Elizondo and Sam Palmer.
Multimedia. Definition What is Multimedia? Multimedia can have a many definitions these include: Multimedia means that computer information can be represented.
Word Wide Cache Distributed Caching for the Distributed Enterprise.
1 In Summary Need more computing power Improve the operating speed of processors & other components constrained by the speed of light, thermodynamic laws,
Computer System Architectures Computer System Software
1 WHY NEED NETWORKING? - Access to remote information - Person-to-person communication - Cooperative work online - Resource sharing.
Common Devices Used In Computer Networks
DISTRIBUTED COMPUTING
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
CLUSTER COMPUTING STIMI K.O. ROLL NO:53 MCA B-5. INTRODUCTION  A computer cluster is a group of tightly coupled computers that work together closely.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Chapter 1 Communication Networks and Services Network Architecture and Services.
Using NAS as a Gateway to SAN Dave Rosenberg Hewlett-Packard Company th Street SW Loveland, CO 80537
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
CLUSTER COMPUTING TECHNOLOGY BY-1.SACHIN YADAV 2.MADHAV SHINDE SECTION-3.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 April 11, 2006 Session 23.
ProActive Infrastructure Eric Brewer, David Culler, Anthony Joseph, Randy Katz Computer Science Division U.C. Berkeley ninja.cs.berkeley.edu Active Networks.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Challenges in the Next Generation Internet Xin Yuan Department of Computer Science Florida State University
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
The Globus Toolkit The Globus project was started by Ian Foster and Carl Kesselman from Argonne National Labs and USC respectively. The Globus toolkit.
Intro to Distributed Systems Hank Levy. 23/20/2016 Distributed Systems Nearly all systems today are distributed in some way, e.g.: –they use –they.
SYSTEM MODELS FOR ADVANCED COMPUTING Jhashuva. U 1 Asst. Prof CSE
Threads, SMP, and Microkernels Chapter 4. Processes and Threads Operating systems use processes for two purposes - Resource allocation and resource ownership.
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Grid Optical Burst Switched Networks
Clouds , Grids and Clusters
Berkeley Cluster Projects
University of Technology
Storage Virtualization
Results of Prior NSF RI Grant: TITAN
Web Server Administration
CS703 - Advanced Operating Systems
Presentation transcript:

TITAN: A Next-Generation Infrastructure for Integrating and Communication David E. Culler Computer Science Division U.C. Berkeley NSF Research Infrastructure Meeting Aug 7, 1999

Aug, 1999NSF RI 992 Project Goal: “Develop a new type of system which harnesses breakthrough communications technology to integrate a large collection of commodity computers into a powerful resource pool that can be accessed directly through its constituent nodes or through inexpensive media stations.” –SW architecture for global operating system –programming language support –advanced applications –multimedia application development

Aug, 1999NSF RI 993 Driving Applications Computational and Storage Core –architecture –operating systems –compiler, language, and library Project Components High Speed Networking Multimedia Shell The Building is the Computer

Aug, 1999NSF RI 994 Use what you build, learn from use,... Develop Enabling Systems Technology Develop Driving Applications

Aug, 1999NSF RI 995 Highly Leveraged Project Large industrial contribution –HP media stations –Sun compute stations –Sun SMPs –Intel media stations –Bay networks ATM, ethernet Enabled several federal grants –NOW –Titanium, Castle –Daedalus, Mash –DLIB Berkeley Multimedia Research Center

Aug, 1999NSF RI 996 Landmarks Top 500 Linpack Performance List MPI, NPB performance on par with MPPs RSA 40-bit Key challenge World Leading External Sort Inktomi search engine NPACI resource site Sustains 500 MB/s disk bandwidth and1,000 MB/s network bandwidth

Aug, 1999NSF RI 997 Sample of 98 Degrees from Titan Amin Vahdat:WebOS Steven Lumetta: Multiprotocol Communication Wendy Heffner: Multicast Communication Protocols Doug Ghormley: Global OS Andrea Dusseau: Implicit Co-scheduling Armando Fox: TACC Proxy Architecture John Byers:Fast, Reliable Bulk Communication Elan Amir:Media Gateway David Bacon:Compiler Optimization Kristen Wright:Scalable web cast Jeanna Neefe:xFS Steven Gribble:Web caching Ian Goldberg:Wingman Eshwar Balani:WebOS security Paul Gautier:Scalable Search Engines

Aug, 1999NSF RI 998 Results Constructed three prototypes, culminating in 100 processor UltraSparc NOW + three extensions –GLUnix global operating system layer –Active Messages providing fast, general purpose user-level communication –xFS cluster file system –Fast sockets, MPI, and SVM –Titanium and Split-C parallel languages –ScaLapack libraries Heavily used in dept. and external research => instrumental in establishing clusters as a viable approach to large scale computing => transitioned to an NPACI experimental resource The Killer App: Scalable Internet Services

Aug, 1999NSF RI 999 First HP/fddi Prototype FDDI on the HP/735 graphics bus. First fast msg layer on non-reliable network

Aug, 1999NSF RI 9910 SparcStation ATM NOW ATM was going to take over the world. Myrinet SAN emerged The original INKTOMI

Aug, 1999NSF RI 9911 Technological Revolution The “Killer Switch” –single chip building block for scalable networks –high bandwidth –low latency –very reliable »if it’s not unplugged => System Area Networks 8 bidirectional ports of 160 MB/s each way < 500 ns routing delay Simple - just moves the bits Detects connectivity and deadlock

Aug, 1999NSF RI node Ultra/Myrinet NOW

Aug, 1999NSF RI 9913 NOW System Architecture Net Inter. HW UNIX Workstation Comm. SW Net Inter. HW Comm. SW Net Inter. HW Comm. SW Net Inter. HW Comm. SW Global Layer UNIX Resource Management Network RAM Distributed Files Process Migration Fast Commercial Switch (Myrinet) UNIX Workstation UNIX Workstation UNIX Workstation Large Seq. Apps Parallel Apps Sockets, Split-C, MPI, HPF, vSM

Aug, 1999NSF RI 9914 Software Warehouse Coherent software environment throughout the research program –Billions bytes of code Mirrored externally New SWW-NT

Aug, 1999NSF RI 9915 Multi-Tier Networking Infrastructure Myrinet Cluster Interconnect ATM backbone Switched Ethernet Wireless

Aug, 1999NSF RI 9916 Multimedia Development Support Authoring tools Presentation capabilities Media stations Multicast support / MBone

Aug, 1999NSF RI 9917 Novel Cluster Designs Tertiary Disk –very low cost massive storage –hosts archive of Museum of Fine Arts Pleiades Clusters –functionally specialized storage and information servers –constant back-up and restore at large scale –NOW tore apart traditional AUSPEX servers CLUMPS –cluster of SMPs with multiple NICs per node

Aug, 1999NSF RI 9918 Massive Cheap Storage Basic unit: 2 PCs double-ending four SCSI chains Currently serving Fine Art at

Aug, 1999NSF RI 9919 Information Servers Basic Storage Unit: – Ultra 2, 300 GB raid, 800 GB tape stacker, ATM –scalable backup/restore Dedicated Info Servers –web, –security, –mail, … VLANs project into dept.

Aug, 1999NSF RI 9920 Cluster of SMPs (CLUMPS) Four Sun E5000s –8 processors –3 Myricom NICs Multiprocessor, Multi- NIC, Multi-Protocol

Aug, 1999NSF RI 9921 Novel Systems Design Virtual networks –integrate communication events into virtual memory system Implicit Co-scheduling –cause local schedulers to co-schedule parallel computations using a two-phase spin-block and observing round-trip Co-operative caching –access remote caches, rather than local disk, and enlarge global cache coverage by simple cooperation Reactive Scalable I/O Network virtual memory, fast sockets ISAAC “active” security Internet Server Architecture TACC Proxy architecture

Aug, 1999NSF RI 9922 Fast Communication Fast communication on clusters is obtained through direct access to the network, as on MPPs Challenge is make this general purpose –system implementation should not dictate how it can be used

Aug, 1999NSF RI 9923 Virtual Networks Endpoint abstracts the notion of “attached to the network” Virtual network is a collection of endpoints that can name each other. Many processes on a node can each have many endpoints, each with own protection domain.

Aug, 1999NSF RI 9924 Process 3 How are they managed? How do you get direct hardware access for performance with a large space of logical resources? Just like virtual memory –active portion of large logical space is bound to physical resources Process n Process 2 Process 1 *** Host Memory Processor NIC Mem Network Interface P

Aug, 1999NSF RI 9925 Network Interface Support NIC has endpoint frames Services active endpoints Signals misses to driver –using a system endpont Frame 0 Frame 7 Transmit Receive EndPoint Miss

Aug, 1999NSF RI 9926 Communication under Load Client Server Msg burst work => Use of networking resources adapts to demand. => VIA (or improvements on it) need to become widespread

Aug, 1999NSF RI 9927 Implicit Coscheduling Problem: parallel programs designed to run in parallel => huge slowdowns with local scheduling –gang scheduling is rigid, fault prone, and complex Coordinate schedulers implicitly using the communication in the program –very easy to build, robust to component failures –inherently “service on-demand”, scalable –Local service component can evolve. A LS A GS A LS GS A LS A GS LS A GS

Aug, 1999NSF RI 9928 Why it works Infer non-local state from local observations React to maintain coordination observationimplication action fast response partner scheduledspin delayed response partner not scheduledblock WS 1 Job A WS 2 Job BJob A WS 3 Job BJob A WS 4 Job BJob A sleep spin requestresponse

Aug, 1999NSF RI 9929 I/O Lessons from NOW sort Complete system on every node powerful basis for data intensive computing –complete disk sub-system –independent file systems »MMAP not read, MADVISE –full OS => threads Remote I/O (with fast comm.) provides same bandwidth as local I/O. I/O performance is very tempermental –variations in disk speeds –variations within a disk –variations in processing, interrupts, messaging,...

Aug, 1999NSF RI 9930 Reactive I/O Loosen data semantics –ex: unordered bag of records Build flows from producers (eg. Disks) to consumers (eg. Summation) Flow data to where it can be consumed D A D A D A D A D A D A D A D A Distributed Queue Static Parallel Aggregation Adaptive Parallel Aggregation

Aug, 1999NSF RI 9931 Performance Scaling Allows more data to go to faster consumer

Aug, 1999NSF RI 9932 Driving Applications Inktomi Search Engine World Record Disk-to_Disk store RSA 40-bit key IRAM simulations, Turbulence, AMR, Lin. Alg. Parallel image processing Protocol verification, Tempest, Bio, Global Climate... Multimedia Work Drove Network Aware Transcoding Services on Demand –Parallel Software-only Video Effects –TACC (transcoding) Proxy »Transcend »Wingman –MBONE media gateway

Aug, 1999NSF RI 9933 Transcend Transcoding Proxy Application provides services to clients Grows/Shrinks according to demand, availability, and faults Service request Front-end service threads Caches User Profile Database Manager Physical processor

Aug, 1999NSF RI 9934 UCB CSCW Class Sigh… no multicast, no bandwidth, no CSCW class... Problem Enable heterogeneous sets of participants to seamlessly join MBone sessions.

Aug, 1999NSF RI 9935 Software agents that enable local processing (e.g. transcoding) and forwarding of source streams. Offer the isolation of a local rate-controller for each source stream. Controlling bandwidth allocation and format conversion to each source prevents link saturation and accommodates heterogeneity. GW A Solution: Media Gateways

Aug, 1999NSF RI 9936 A Solution: Media Gateways Media GW MBone Sigh… no multicast, no bandwidth, no MBone... AHA!

Aug, 1999NSF RI 9937 FIAT LUX: Bringing it all together Combines –Image Based Modeling and Rendering, –Image Based Lighting, –Dynamics Simulation and –Global Illumination in a completely novel fashion to achieve unprecedented levels of scientific accuracy and realism Computing Requirements –15 Days of worth of time for development. –5 Days for rendering Final piece. –4 Days for rendering in HDTV resolution on 140 Processors Storage –72,000 Frames, 108 Gigabytes of storage –7.2 Gigs after motion blur –500 MB JPEG premiere at the SIGGRAPH 99 Electronic Theater –