TECHNIQUES FOR REDUCING CONSISTENCY- RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel.

Slides:

Advertisements

Similar presentations

Multiple Processor Systems

Advertisements

The Effect of Network Total Order, Broadcast, and Remote-Write on Network- Based Shared Memory Computing Robert Stets, Sandhya Dwarkadas, Leonidas Kontothanassis,

Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.

MACHINE-INDEPENDENT VIRTUAL MEMORY MANAGEMENT FOR PAGED UNIPROCESSOR AND MULTIPROCESSOR ARCHITECTURES R. Rashid, A. Tevanian, M. Young, D. Golub, R. Baron,

Presented by Evan Yang. Overview of Munin  Distributed shared memory (DSM) system  Unique features Multiple consistency protocols Release consistency.

Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.

Distributed Shared Memory

Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science

1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.

November 1, 2005Sebastian Niezgoda TreadMarks Sebastian Niezgoda.

CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 25: Distributed Shared Memory All slides © IG.

Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Lightweight Logging For Lazy Release Consistent DSM Costa, et. al. CS /01/01.

Memory consistency models Presented by: Gabriel Tanase.

1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.

Yousuf Surmust Instructor: Marius Soneru Course: CS550 Fall 2001

Distributed Resource Management: Distributed Shared Memory

Chapter 11 Operating Systems

Multiple Processor Systems 8.1 Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.

Memory Consistency Models

1 CSE SUNY New Paltz Chapter Nine Multiprocessors.

Communication Models for Parallel Computer Architectures 4 Two distinct models have been proposed for how CPUs in a parallel computer system should communicate.

PRASHANTHI NARAYAN NETTEM.

Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.

Distributed Shared Memory Systems and Programming

TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.

Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.

Distributed Shared Memory (DSM)

2008 dce Distributed Shared Memory Pham Quoc Cuong & Phan Dinh Khoi Use some slides of James Deak - NJIT.

Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.

B. Prabhakaran 1 Distributed Shared Memory DSM provides a virtual address space that is shared among all nodes in the distributed system. Programs access.

ECE200 – Computer Organization Chapter 9 – Multiprocessors.

CS425/CSE424/ECE428 – Distributed Systems Nikita Borisov - UIUC1 Some material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra,

Consistency and Replication Chapter 6. Release Consistency (1) A valid event sequence for release consistency. Use acquire/release operations to denote.

An Efficient Lock Protocol for Home-based Lazy Release Consistency Electronics and Telecommunications Research Institute (ETRI) 2001/5/16 HeeChul Yun.

Ch 10 Shared memory via message passing Problems –Explicit user action needed –Address spaces are distinct –Small Granularity of Transfer Distributed Shared.

Distributed Shared Memory Based on Reference paper: Distributed Shared Memory, Concepts and Systems.

Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.

Distributed Shared Memory Presentation by Deepthi Reddy.

Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.

Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix.

Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

DISTRIBUTED COMPUTING

Page 1 Distributed Shared Memory Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.

A Design of User-Level Distributed Shared Memory Zhi Zhai Feng Shen Computer Science and Engineering University of Notre Dame Oct. 27, 2009 Progress Report.

1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.

1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.

Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik.

Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.

OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.

Background Computer System Architectures Computer System Software.

Primitive Concepts of Distributed Systems Chapter 1.

Region-Based Software Distributed Shared Memory Song Li, Yu Lin, and Michael Walker CS Operating Systems May 1, 2000.

Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.

The University of Adelaide, School of Computer Science

Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.

Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.

Software Coherence Management on Non-Coherent-Cache Multicores

Distributed Shared Memory

Lecture 21 Synchronization

Lecture 26 A: Distributed Shared Memory

Outline Midterm results summary Distributed file systems – continued

Distributed Shared Memory

High Performance Computing

Lecture 26 A: Distributed Shared Memory

Distributed Resource Management: Distributed Shared Memory

Presentation transcript:

TECHNIQUES FOR REDUCING CONSISTENCY- RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel Rice University

INTRODUCTION Distributed shared memory is a software abstraction allowing a set of workstations connected by a LAN to share a single paged virtual address space Key issue in building a software DSM is minimizing the amount of data communication among the workstation memories

Why bother with DSM? Key idea is to build fast parallel computers that –are cheaper than conventional architectures –are convenient to use Conventional parallel computer architecture was the shared memory multiprocessor

CPU Shared memory Conventional parallel architecture CACHE CPU

Today’s architecture Clusters of workstations are much more cost effective –No need to develop complex bus and cache structures –Can use off-the-shelf networking hardware Gigabit Ethernet Myrinet (1.5 Gb/s) –Can quickly integrate newest microprocessors

Limitations of cluster approach Communication within a cluster of workstation is through message passing –Much harder to program than concurrent access to a shared memory Many big programs were written for shared memory architectures –Converting them to a message passing architecture is a nightmare

Distributed shared memory DSM = one shared global address space main memories

Distributed shared memory DSM makes a cluster of workstations look like a shared memory parallel computer –Easier to write new programs –Easier to port existing programs Key problem is that DSM only provides the illusion of having a shared memory architecture –Data must still move back and forth among the workstations

Characterizing a DSM (I) Four important issues: 1. Size of transfer units (level of granularity) Big units are more efficient – Virtual memory pages Can have false sharing whenever page contains different variables that are accessed at the same time by different processors

False Sharing accesses x accesses y x y page containing x and y will move back and forth between main memories of workstations

Characterizing a DSM (II) 2.Consistency model Strict consistency is not possible Various authors have proposed weak consistency models –Cheaper to implement –Harder to use in a correct fashion

Characterizing a DSM (III) 3.Portability of programs Some DSMs allow programs written for a multiprocessor architecture to run on a cluster of workstations without any modifications ( dusty decks ) More efficient DSMs require more changes 4.Portability of DSM Some DSMs require specific OS features

MUNIN Developed at Rice University Based on software objects (variables) Uses the processor virtual memory to detect access to the shared objects Includes several techniques for reducing consistency-related communication Only runs on top of V kernel

Key features Software release consistency: only requires the memory to be consistent at specific synchronization points, Multiple consistency protocols: allow the user to select the best consistency protocols for each data item, Write-shared protocols: reduce false sharing, An update-with-timeout mechanism

SW RELEASE CONSISTENCY (I) Well-written parallel programs use locks to achieve mutual exclusion when they access shared variables –P(&mutex) and V(&mutex) –lock(&csect) and unlock(&csect) –request ( ) and release( ) Unprotected accesses can produce unpredictable results

SW RELEASE CONSISTENCY (II) SW release consistency will only guarantee correctness of operations within a request/release pair No need to propagate new values of shared variables until the release Must guarantee that workstation has received the most recent values of all shared variables when it completes a request

SW RELEASE CONSISTENCY (III) shared int x; request( ); x = 1; release ( ); // propagate x= 1 shared int x; request( ); // wait for new value of x x++; release ( ); // propagate x= 2

SW RELEASE CONSISTENCY (IV) Munin uses eager release : new values of shared variables are propagated at release time – Lazy release delays propagation until a request is issued ( Threadmarks ) A workstation issuing a request gets the current values of all shared variables –Shared variables are not associated to a particular critical section (as in Midway )

Munin Implementation (I) Three kinds of variables: 1.Ordinary variables: can only be accessed by the process that created them 2.Shared data variables: should always be accessed from within critical regions 3.Synchronization variables : locks, barriers or condition variables must be accessed through special library procedures.

Munin Implementation (II) When a processor modifies shared data inside a critical region, all update messages are buffered and delayed until the processor leaves the critical region Processes accessing shared data variables outside critical regions do it at their own risks –Same as with shared memory model –Risk is higher

FOUR CONSISTENCY PROTOCOLS 1.Conventional shared variables: –Replicated on demand – Single writer/multiple readers policy uses an invalidation-based protocol 2.Read-only variables: –Replicated on demand –Any attempt to modify them will result in a runtime error

FOUR CONSISTENCY PROTOCOLS 3.Migratory variables: –Migrated among the processes accessing them –Every process accessing them will always get full read and write access 4.Write-shared variables: –Can be updated concurrently because different portions of the page are accessed

Implementation Programmer uses annotations to specify any of the last three consistency protocols –Read-only variables –Migratory variables –Write-shared variables Incorrect annotations may result in inefficient performance or in runtime errors but not in incorrect results

WRITE-SHARED PROTOCOL (I) Designed to fight false sharing Uses a copy-on-write mechanism Whenever a process is granted access to write- shared data, the page containing these data is marked copy-on-write First attempt to modify the contents of the page will result in the creation of a copy of the page modified (the twin ).

x = 1 y = 2 x = 1 y = 2 First write access twin x = 3 y = 2 Before After Compare with twin New value of x is 3 Example

WRITE-SHARED PROTOCOL (II) At release time, the DSM will perform a word by word comparison of the page and its twin, store the diff in the space used by the twin page and notify all processors having a copy of the shared data of the update A runtime switch can be set to check for conflicting updates to write-shared data.

UPDATE TIME-OUT MECHANISM Munin does not send updates to processors holding stale replicas Anytime a processor receives an update for a page for which it does not have a twin, the page is marked supervisor-only and the time of receipt of the update is recorded. First local access to the page will cause a trap that will remove the restriction

UPDATE TIME-OUT MECHANISM When a process receives an update for a page that is still marked supervisor only, it checks the timestamp of the last update If more than 50 ms have elapsed, process notifies the originator of the update not to send more updates and invalidates the page.

CONCLUSIONS (I) The strongest point of Munin is its excellent performance –typically within 5 to 33% of the performances of hand-coded message passing versions of the same programs Its major limitation is its dependence of some features of the V kernel

CONCLUSIONS (II) Munin requires programs to access shared data from within critical regions or after barriers –Appears to be a reasonable requirement Munin allows users to tune the performance of their programs by selecting the best consistency protocol for each shared variable –Can quickly become a tedious process

FURTHER DEVELOPMENTS Same team has come with a successor to Munin named TreadMarks Key differences are: –TreadMarks uses a more complex lazy release protocol –TreadMarks is UNIX-based More portable