Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.

Slides:



Advertisements
Similar presentations
Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.
Advertisements

MACHINE-INDEPENDENT VIRTUAL MEMORY MANAGEMENT FOR PAGED UNIPROCESSOR AND MULTIPROCESSOR ARCHITECTURES R. Rashid, A. Tevanian, M. Young, D. Golub, R. Baron,
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Distributed Shared Memory
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
November 1, 2005Sebastian Niezgoda TreadMarks Sebastian Niezgoda.
Distributed Operating Systems CS551 Colorado State University at Lockheed-Martin Lecture 4 -- Spring 2001.
Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
Yousuf Surmust Instructor: Marius Soneru Course: CS550 Fall 2001
Distributed Resource Management: Distributed Shared Memory
PRASHANTHI NARAYAN NETTEM.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed Shared Memory.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Computer System Architectures Computer System Software
Distributed Shared Memory Systems and Programming
TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.
Distributed Shared Memory (DSM)
2008 dce Distributed Shared Memory Pham Quoc Cuong & Phan Dinh Khoi Use some slides of James Deak - NJIT.
August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.
Shared Address Space Computing: Hardware Issues Alistair Rendell See Chapter 2 of Lin and Synder, Chapter 2 of Grama, Gupta, Karypis and Kumar, and also.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed Shared Memory Steve Ko Computer Sciences and Engineering University at Buffalo.
B. Prabhakaran 1 Distributed Shared Memory DSM provides a virtual address space that is shared among all nodes in the distributed system. Programs access.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
TECHNIQUES FOR REDUCING CONSISTENCY- RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel.
CS425/CSE424/ECE428 – Distributed Systems Nikita Borisov - UIUC1 Some material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra,
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
Supporting Multi-Processors Bernard Wong February 17, 2003.
Ch 10 Shared memory via message passing Problems –Explicit user action needed –Address spaces are distinct –Small Granularity of Transfer Distributed Shared.
Distributed Shared Memory Based on Reference paper: Distributed Shared Memory, Concepts and Systems.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Distributed Shared Memory Presentation by Deepthi Reddy.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
DISTRIBUTED COMPUTING
Page 1 Distributed Shared Memory Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.
A Design of User-Level Distributed Shared Memory Zhi Zhai Feng Shen Computer Science and Engineering University of Notre Dame Oct. 27, 2009 Progress Report.
1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik.
Memory Coherence in Shared Virtual Memory System ACM Transactions on Computer Science(TOCS), 1989 KAI LI Princeton University PAUL HUDAK Yale University.
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.
Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Background Computer System Architectures Computer System Software.
Primitive Concepts of Distributed Systems Chapter 1.
The University of Adelaide, School of Computer Science
Software Coherence Management on Non-Coherent-Cache Multicores
Distributed Shared Memory
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Ivy Eva Wu.
Distributed Shared Memory
The University of Adelaide, School of Computer Science
Outline Midterm results summary Distributed file systems – continued
Distributed Shared Memory
Multiple Processor Systems
CSS490 Distributed Shared Memory
High Performance Computing
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 24: Multiprocessors
Distributed Resource Management: Distributed Shared Memory
Lecture 17 Multiprocessors and Thread-Level Parallelism
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

INTRODUCTION Distributed shared memory is a software abstraction allowing a set of workstations connected by a LAN to share a single paged virtual address space

Why bother with DSM? Key idea is to build fast parallel computers that are –Cheaper than shared memory multiprocessor architectures –As convenient to use

CPU Shared memory Conventional parallel architecture CACHE CPU

Today’s architecture Clusters of workstations are much more cost effective –No need to develop complex bus and cache structures –Can use off-the-shelf networking hardware Gigabit Ethernet Myrinet (1.5 Gb/s) –Can quickly integrate newest microprocessors

Limitations of cluster approach Communication within a cluster of workstation is through message passing –Much harder to program than concurrent access to a shared memory Many big programs were written for shared memory architectures –Converting them to a message passing architecture is a nightmare

Distributed shared memory DSM = one shared global address space main memories

Distributed shared memory DSM makes a cluster of workstations look like a shared memory parallel computer –Easier to write new programs –Easier to port existing programs Key problem is that DSM only provides the illusion of having a shared memory architecture –Data must still move back and forth among the workstations

Basic approaches Hardware implementations: –Use extensions of traditional hardware caching architecture Operating system/library implementations: –Use virtual memory mechanisms Compiler implementations –Compiler handles all shared accesses

Design Issues (I) 1.Structure and granularity –Big units are more efficient Virtual memory pages –Can have false sharing whenever page contains different variables that are accessed at the same time by different processors

False Sharing accesses x accesses y x y page containing x and y will move back and forth between main memories of workstations

Design Issues (II) 1.Structure and granularity (cont'd) –Shared objects can also be Objects from a distributed object- oriented system Data types from an extant language

Design Issues (III) 2. Coherence semantics –Strict consistency is not possible –Various authors have proposed weaker consistency models Cheaper to implement Harder to use in a correct fashion

Design Issues (IV) 3.Scalability –Possibly very high but limited by Central bottlenecks Global knowledge operation and storage

Design Issues (V) 4.Heterogeneity –Possible but complex to implement

Portability Issues Portability of programs –Some DSMs allow programs written for a multiprocessor architecture to run on a cluster of workstations without any modifications ( dusty decks ) –More efficient DSMs require more changes Portability of DSM –Some DSMs require specific OS features Not in paper

Implementation Issues (I) 1. Data Location and Access: Keep data a single centralized location Let data migrate (better) but must have way to locate them Centralized server (bottleneck) Have a "home" node associated with each piece of data Will keep track of its location

Implementation Issues (II) 1.Data Location and Access (cont'd): Can either Maintain a single copy of each piece of data Replicate it on demand Must either Propagate updates to all replicas Use an invalidation protocol

Invalidation protocol Before update: At update time X = 0 X = 5 X = 0 INVALID

Main advantage Locality of updates: –A page that is being modified has a high likelihood of being modified again Invalidation mechanism minimizes consistency overhead –One single invalidation replaces many updates

A realization: Munin Developed at Rice University Based on software objects (variables) Used the processor virtual memory to detect access to the shared objects Included several techniques for reducing consistency-related communication Only ran on top of the V kernel

Munin main strengths Excellent performance Portability of programs –Allowed programs written for a multiprocessor architecture to run on a cluster of workstations with a minimum number of changes ( dusty decks )

Munin main weakness Very poor portability of Munin itself –Depended of some features of the V kernel Not maintained since the late 80's

Consistency model Munin uses software release consistency –Only requires the memory to be consistent at specific synchronization points

SW release consistency (I) Well-written parallel programs use locks to achieve mutual exclusion when they access shared variables –P(&mutex) and V(&mutex) –lock(&csect) and unlock(&csect) –acquire( ) and release( ) Unprotected accesses can produce unpredictable results

SW release consistency (II) SW release consistency will only guarantee correctness of operations performed within a request/release pair No need to export the new values of shared variables until the release Must guarantee that workstation has received the most recent values of all shared variables when it completes a request

SW release consistency (III) shared int x; acquire( ); x = 1; release ( ); // export x= 1 shared int x; acquire( ); // wait for new value of x x++; release ( ); // export x= 2

SW release consistency (IV) Must still decide how to release updated values – Munin uses eager release : New values of shared variables were propagated at release time

SW release consistency (V) Eager release Each release forwards the update to the two other processors.

Multiple write protocol Designed to fight false sharing Uses a copy-on-write mechanism Whenever a process is granted access to write- shared data, the page containing these data is marked copy-on-write First attempt to modify the contents of the page will result in the creation of a copy of the page modified (the twin ).

Creating a twin Not in paper

x = 1 y = 2 x = 1 y = 2 First write access twin x = 3 y = 2 Before After Compare with twin New value of x is 3 Example Not in paper

Other DSM Implementations (I) Software release consistency with lazy release (Treadmarks) –Faster and designed to be portable Sequentially-Consistent Software DSM (IVY): –Sends messages to other copies at each write –Much slower

Other DSM Implementations (II) Entry consistency (Midway): –Requires each variable to be associated to a synchronization object (typically a lock) –Acquire/release operations on a given synchronization object only involve the variables associated with that object –Requires less data traffic –Does not handle well dusty decks

Other DSM Implementations (III) Structured DSM Systems (Linda): –Offer to the programmer a shared tuple space accessed using specific synchronized methods –Require a very different programming style

TODAY'S IMPACT Very low: –According to W. Zwaepoel. truth is that computer clusters are "only suitable for coarse- grained parallel computation" and this is "[a] fortiori true for DSM" –DSM competed with OpenMP model and OPenMP model won