Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
The TickerTAIP Parallel RAID Architecture P. Cao, S. B. Lim S. Venkatraman, J. Wilkes HP Labs.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Distributed Processing, Client/Server, and Clusters
Distributed Hardware How are computers interconnected ? –via a bus-based –via a switch How are processors and memories interconnected ? –Private –shared.
Introduction  What is an Operating System  What Operating Systems Do  How is it filling our life 1-1 Lecture 1.
Software for Distributed Systems. Distributed Systems – Case Studies NOW: a Network of Workstations Condor: High Throughput Computing MOSIX: A Distributed.
PRASHANTHI NARAYAN NETTEM.
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
Condor Overview Bill Hoagland. Condor Workload management system for compute-intensive jobs Harnesses collection of dedicated or non-dedicated hardware.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Distributed File Systems Sarah Diesburg Operating Systems CS 3430.
Client/Server Architectures
Miron Livny Computer Sciences Department University of Wisconsin-Madison Harnessing the Capacity of Computational.
Shilpa Seth.  Centralized System Centralized System  Client Server System Client Server System  Parallel System Parallel System.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
 Introduction to Operating System Introduction to Operating System  Types Of An Operating System Types Of An Operating System  Single User Single User.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Types of Operating Systems
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
VMware vSphere Configuration and Management v6
FATCOP: A Mixed Integer Program Solver Michael FerrisQun Chen Department of Computer Sciences University of Wisconsin-Madison Jeff Linderoth, Argonne.
Distributed Computing Systems CSCI 6900/4900. Review Distributed system –A collection of independent computers that appears to its users as a single coherent.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Data Communications and Networks Chapter 9 – Distributed Systems ICT-BVF8.1- Data Communications and Network Trainer: Dr. Abbes Sebihi.
Distributed Systems Unit – 1 Concepts of DS By :- Maulik V. Dhamecha Maulik V. Dhamecha (M.Tech.)
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
Operating Systems (CS 340 D) Dr. Abeer Mahmoud Princess Nora University Faculty of Computer & Information Systems Computer science Department.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
Parallel IO for Cluster Computing Tran, Van Hoai.
Distributed Computing Systems CSCI 6900/4900. Review Definition & characteristics of distributed systems Distributed system organization Design goals.
Tackling I/O Issues 1 David Race 16 March 2010.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Background Computer System Architectures Computer System Software.
Condor on Dedicated Clusters Peter Couvares and Derek Wright Computer Sciences Department University of Wisconsin-Madison
Condor A New PACI Partner Opportunity Miron Livny
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Distributed File Systems
Examples Example: UW-Madison CHTC Example: Global CMS Pool
Example: Rapid Atmospheric Modeling System, ColoState U
Operating Systems (CS 340 D)
Definition of Distributed System
Operating Systems (CS 340 D)
Chapter 16: Distributed System Structures
Introduction to Operating Systems
Advanced Operating Systems
Chapter 17: Database System Architectures
Basic Grid Projects – Condor (Part I)
Multiple Processor Systems
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
CSE 451: Operating Systems Spring Module 21 Distributed File Systems
Multiple Processor and Distributed Systems
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
Virtual Memory: Working Sets
Database System Architectures
Condor-G Making Condor Grid Enabled
Network File System (NFS)
PU. Setting up parallel universe in your pool and when (not
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Distributed Systems Early Examples

Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating the feasibility of their approach Condor – University of Wisconsin-Madison Started about 1988 Is now an ongoing, world-wide system of shared computing clusters.

NOW: a Network of Workstations Scale: A NOW system consists of a building- wide collection of machines providing memory, disks, and processors. Basic Ideas – Use idle CPU cycles for parallel processing on clusters of workstations – Use memories as disk cache to break the I/O bottleneck (slow disk access times) – Share the resources over fast LANs

NOW “Opportunities”: Memory Network RAM: fast networks, high bandwidth make it reasonable to page across the network. – Instead of paging out to slow disks, send over fast networks to RAM in an idle machine Cooperative file caching: improve performance by using neetwork RAM as a very large file cache – Shared files can be fetched from another client’s memory rather than server’s disk – Active clients can extend their disk cache size by using memory of idle clients.

NOW Opportunities: “RAWD” (Redundant Arrays of Workstation Disks) RAID systems provide fast performance by connecting arrays of small disks. By reading/ writing data in parallel, throughput is increased. Instead of a hardware RAID, build software version by writing data across the work stations in the network – Especially useful for parallel programs running on separate machines in the network

NOW Opportunities: “Parallel Computing” Harnessing the power of multiple idle workstations in a NOW can support high- performance parallel applications. NOW principles: – avoid going to disk by using RAM on other network nodes (assumes network faster than disk) – Further speedup may be achieved by parallelizing the computation and striping the data to multiple disks. – allow user processes to access the network directly rather than going through the operating system

Berkeley NOW Features GLUnix (Global Layer UNIX) is a layer on top of UNIX OS’s running on the workstations Applications running on GLUnix have a protected virtual operating system layer which catches UNIX system calls and translates them into GLUnix calls. Serverless Network File System – xFS – Avoids central server bottleneck – Cooperative file system (basically, peer-to-peer)

Summary Successful, in their opinion. Ran for several years in the late 90s on the Berkeley CS system Key enabling technologies – Scalable, high performance network – Fast access to the network for user processes – Global operating system layer to support system resources as a true shared pool.

CONDOR Goal: “…to develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing (HTC) on large collections of distributively owned computing resources” – HTC computing – “… problems that require weeks or months of computation to solve. …this type of research need a computing environment that delivers large amounts of computational power over a long period of time.” – Compare to High Performance computing (HPC) which “…delivers a tremendous amount of power over a short period of time.”

Overview Condor can be used to manage computing clusters. It is designed to take advantage of idle machines Condor lets users submit many jobs at the same time. Result: tremendous amounts of computation with very little user intervention. No need to rewrite code - just link to Condor libraries

Features Checkpoints: save complete state of the process – Critical for programs that run for long periods of time to recover from crashes or to vacate machines whose user has returned, or for process migration due to other reasons. Remote system calls: data resides on the home machine and system calls are directed there. Provides protection for host machines.

Features Jobs can run anywhere in the cluster (which can be a physical cluster, a virtual cluster, or even a single machine) Different machines have different capabilities; when submitting a job Condor uses can specify the kind of machine they wish to run on. When sets of jobs are submitted it’s possible to define dependencies; i.e., “don’t run Job 3 until jobs 1 and 2 have completed.”

Slides From a talk by Myron Livny, “The Principles and Power of Distributed Computing”, International Winter School on Grid Computing Livny is a professor at the U of Wisconsin- Madison where he heads the Condor Project, and other grid/distributed computing projects or centers.

Condor Daemons Title unknown, by Hans Holbein the Younger, from Historiarum Veteris Testamenti icones, 1543

Condor Daemons master negotiator collector schedd startd starter shadow procd kbdd exec

Condor today