COS 461 Fall 1997 Workstation Clusters u replace big mainframe machines with a group of small cheap machines u get performance of big machines on the cost-curve.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Background Virtual memory – separation of user logical memory from physical memory. Only part of the program needs to be in memory for execution. Logical.
Virtual Memory: Page Replacement
Paging: Design Issues. Readings r Silbershatz et al: ,
Distributed Processing, Client/Server and Clusters
Database Architectures and the Web
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
LANs and WANs Network size, vary from –simple office system (few PCs) to –complex global system(thousands PCs) Distinguish by the distances that the network.
Definition of a Distributed System (1) A distributed system is: A collection of independent computers that appears to its users as a single coherent system.
Distributed Processing, Client/Server, and Clusters
Ken Birman. Massive data centers We’ve discussed the emergence of massive data centers associated with web applications and cloud computing Generally.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Memory Design Example. Selecting Memory Chip Selecting SRAM Memory Chip.
Operating System Support Focus on Architecture
Computer ArchitectureFall 2008 © CS : Computer Architecture Lecture 22 Virtual Memory (1) November 6, 2008 Nael Abu-Ghazaleh.
Multiprocessing Memory Management
1 Virtual Memory vs. Physical Memory So far, all of a job’s virtual address space must be in physical memory However, many parts of programs are never.
Introduction  What is an Operating System  What Operating Systems Do  How is it filling our life 1-1 Lecture 1.
Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.
1 Distributed Systems: Distributed Process Management – Process Migration.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.
N-Tier Architecture.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Local Area Networks (LAN) are small networks, with a short distance for the cables to run, typically a room, a floor, or a building. - LANs are limited.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
CH2 System models.
CMPE 421 Parallel Computer Architecture
Distributed File Systems
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
Processes and OS basics. RHS – SOC 2 OS Basics An Operating System (OS) is essentially an abstraction of a computer As a user or programmer, I do not.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
IT253: Computer Organization
Lecture Topics: 11/17 Page tables TLBs Virtual memory flat page tables
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.
Remote Controller & Presenter Make education more efficiently
Server Virtualization
Serverless Network File Systems Overview by Joseph Thompson.
Chapter 8-2 : Multicomputers Multiprocessors vs multicomputers Multiprocessors vs multicomputers Interconnection topologies Interconnection topologies.
Page 1 Process Migration & Allocation Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this.
Definition of a Distributed System (1) A distributed system is: A collection of independent computers that appears to its users as a single coherent system.
Lecture Topics: 11/24 Sharing Pages Demand Paging (and alternative) Page Replacement –optimal algorithm –implementable algorithms.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Storage Systems CSE 598d, Spring 2007 OS Support for DB Management DB File System April 3, 2007 Mark Johnson.
Week 10 March 10, 2004 Adrienne Noble. Important Dates Project 4 due tomorrow (Friday) Final Exam on Tuesday, March 16, 2:30- 4:20pm.
NETW3005 Virtual Memory. Reading For this lecture, you should have read Chapter 9 (Sections 1-7). NETW3005 (Operating Systems) Lecture 08 - Virtual Memory2.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Background Computer System Architectures Computer System Software.
Intro to Distributed Systems Hank Levy. 23/20/2016 Distributed Systems Nearly all systems today are distributed in some way, e.g.: –they use –they.
CDA 5155 Virtual Memory Lecture 27. Memory Hierarchy Cache (SRAM) Main Memory (DRAM) Disk Storage (Magnetic media) CostLatencyAccess.
System Models Advanced Operating Systems Nael Abu-halaweh.
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
BIG DATA/ Hadoop Interview Questions.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
Introduction to Distributed Platforms
N-Tier Architecture.
Definition of Distributed System
Chapter 1: Introduction
Page Replacement.
Cooperative Caching, Simplified
Database System Architectures
Chapter-1 Computer is an advanced electronic device that takes raw data as an input from the user and processes it under the control of a set of instructions.
Page Cache and Page Writeback
Presentation transcript:

COS 461 Fall 1997 Workstation Clusters u replace big mainframe machines with a group of small cheap machines u get performance of big machines on the cost-curve of small machines u technical challenges –meeting the performance goal –providing single system image

COS 461 Fall 1997 Supporting Trends u economics –consumer market in PCs leads to economies of scale and fierce competition among suppliers »result: lower cost –Gordon Bell’s rule of thumb: double manufacturing volume, cut cost by 10% u technology –PCs are big enough to do interesting things –networks have gotten really fast

COS 461 Fall 1997 Models u machines on desks –pool resources among everybody’s desktop machine u virtual mainframe –build a “cluster system” that sits in a machine room –use dedicated PCs, dedicated network –special-purpose software

COS 461 Fall 1997 Model Comparison u advantage of machines on desks –no hardware to buy u advantages of virtual mainframe –no change to client OS –more reliable and secure –resource allocation easier –better network performance

COS 461 Fall 1997 Resource Pooling u CPU –run each process on the best machine –stay close to user –balance load u memory –use idle memory to store VM pages, cached disks blocks u storage –distributed file system (already covered)

COS 461 Fall 1997 CPU Pooling u How should we decide where to run a computation? u How can we move computations between machines? u How should shared resources be allocated?

COS 461 Fall 1997 Efficiency of Distributed Scheduling u queueing theory predicts performance u assume –10 users –each user creates jobs randomly at rate C –machine finishes jobs randomly at rate F u compare three configurations –separate machine for each user –10 machines, distributed scheduling –a single super-machine (10x faster)

COS 461 Fall 1997 Predicted Response Time separate machines super-machine pooled machines 1 F-C 1 10(F-C) between the other two like separate under light load like super under heavy load

COS 461 Fall 1997 Independent Processes u simplest method (on vanilla Unix) –monitor load-average of all machines –when a new process is created, put it on the least-loaded machine –processes don’t move u pro: simple u con: doesn’t balance load unless new processes are created; Unix isn’t location- transparent

COS 461 Fall 1997 Location Transparency u principle: a process should see itself as running on the machine where it was created u location-dependencies: process-Ids, parts of file system, sockets, etc. u usual solution –run “proxy” process on machine where process was created –“system calls” cause RPC to proxy

COS 461 Fall 1997 Process Migration u idea: move running processes around to balance load u problems: –how to move a running process –when to migrate –how to gather load information

COS 461 Fall 1997 Moving a Process u steps –stop process, saving all state into memory –move memory image to another machine –reactivate the memory image u problems –can’t move to machine with different architecture or OS –image is big, so expensive to move –need to set up proxy process

COS 461 Fall 1997 Migration Policy u migration can be expensive, so do rarely u migration balances load, so do often u many policies exist u typical design: let imbalance persist for a while before migrating –“patience time” is several times the cost of a migration

COS 461 Fall 1997 Pooling Memory u some machines need more memory than they have; some need less u let machines use each other’s memory –virtual memory backing store –disk block cache u assume (for now) all nodes use distinct pages and disk blocks

COS 461 Fall 1997 Failure and Memory Pooling u might lose remotely-stored pages in a crash u solution: make remote memory servers stateless u only store pages you can afford to lose –for virtual memory: write to local disk, then store copy in remote memory –for disk blocks, only store “clean” blocks in remote memory u drawback: no reduction in writes

COS 461 Fall 1997 Local Memory Management Locally-used pages Global page pool within each block, use LRU replacement

COS 461 Fall 1997 Issues u how to divide space between local and global pools –goal: throw away the least recently used stuff »keep (approximate) timestamp of last access for each page »throw away the oldest page u what to do with thrown-away pages –really throw away, or migrate to another machine –where to migrate

COS 461 Fall 1997 Random Migration u when evicting page –throw away with probability P –otherwise, migrate to random machine »may immediately re-do at new machine u good: simple local decisions; generally does OK when load is reasonably balanced u bad: does 1/P as much work as necessary; makes bad decisions when load is imbalanced

COS 461 Fall 1997 N-chance Forwarding u forward page N times before discarding it u forward to random places u improvement –gather hints about oldest page on other machines –use hints to bias decision about where to forward pages to u does a little better than random

COS 461 Fall 1997 Global Memory Management u idea: always throw away a page that is one of the very oldest u periodically, gather state –mark the oldest 2% of pages as “old” –count number of old pages on each machine –distribute counts to all machines u each machine now has an idea of where the old pages are

COS 461 Fall 1997 Global Memory Management u when evicting a page –throw it away if it’s old –otherwise, pick a machine to forward to »prob. of sending to M proportional to number of old pages on M u when a node that had old pages runs out of old pages, stop and regather state u good: old throws away old pages; fewer multi-migrations u bad: cost of gathering state

COS 461 Fall 1997 Virtual Mainframe u challenges are performance and single system image u lots of work in commercial and research worlds on this u case study: SHRIMP project –two generations built here at Princeton »focus on last generation –dual goals: parallel scientific computing and virtual mainframe apps

COS 461 Fall 1997 SHRIMP-3... Message passing libraries, Shared virtual memory, Fault-tolerance Graphics, Scalable storage server, Performance measurement Applications WinNT/Linux PC Network Interface WinNT/Linux PC Network Interface WinNT/Linux PC Network Interface...

COS 461 Fall 1997 Performance Approach u single user-level process on each machine –cooperate to provide single system image –client connects to any machine u optimized user-level to user-level communication –low latency for control messages –high bandwidth for block transfers

COS 461 Fall 1997 Virtual Memory Mapped Comm. VA space 1VA space N... Network Interface Network Interface VA space 1VA space N... Network Interface Network Interface Network

COS 461 Fall 1997 Communication Strategy u separate permission checking from communication –establish “mapping” once –move data many times u communication looks like local-to-remote memory copy –supported directly by hardware

COS 461 Fall 1997 Higher-Level Communication u support sockets and RPC via specialized libraries u calls do extra sender-to-receiver communication to coordinate data transfer u bottom line for sockets –15 microsecond latency –90 Mbyte/sec bandwidth –much faster than alternatives

COS 461 Fall 1997 Pulsar Storage Service Fast communication shared logical disk shared logical disk shared logical disk shared file system shared file system shared file system disk

COS 461 Fall 1997 Single Network-Interface Image u want to tell clients there is just one server, even when there are many –balance load automatically u methods –DNS round-robin –IP-level routing »based on IP address of peer »dynamic, based on load

COS 461 Fall 1997 Summary u clusters of cheap machines can replace mainframes –keys: fast flexible communication, carefully implemented single system image –experience with databases too u this method is becoming mainstream u more work needed to make machines-on- desks model work