Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix.

Slides:



Advertisements
Similar presentations
The Effect of Network Total Order, Broadcast, and Remote-Write on Network- Based Shared Memory Computing Robert Stets, Sandhya Dwarkadas, Leonidas Kontothanassis,
Advertisements

Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.
Multiple-Writer Distributed Memory. The Sequential Consistency Memory Model P1P2 P3 switch randomly set after each memory op ensures some serial order.
KERNEL MEMORY ALLOCATION Unix Internals, Uresh Vahalia Sowmya Ponugoti CMSC 691X.
Presented by Evan Yang. Overview of Munin  Distributed shared memory (DSM) system  Unique features Multiple consistency protocols Release consistency.
Distributed Shared Memory
Consistency Models Based on Tanenbaum/van Steen’s “Distributed Systems”, Ch. 6, section 6.2.
1 Release Consistency Slides by Konstantin Shagin, 2002.
1 Munin, Clouds and Treadmarks Distributed Shared Memory course Taken from a presentation of: Maya Maimon (University of Haifa, Israel).
Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
November 1, 2005Sebastian Niezgoda TreadMarks Sebastian Niezgoda.
Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
(Software) Distributed Shared Memory (aka Shared Virtual Memory)
Lightweight Logging For Lazy Release Consistent DSM Costa, et. al. CS /01/01.
Memory consistency models Presented by: Gabriel Tanase.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Memory Consistency Models
CSS434 DSM1 CSS434 Distributed Shared Memory Textbook Ch18 Professor: Munehiro Fukuda.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Distributed Shared Memory Systems and Programming
Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.
Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.
View-Oriented Parallel Programming for multi-core systems Dr Zhiyi Huang World 45 Univ of Otago.
Distributed shared memory. What we’ve learnt so far  MapReduce/Dryad as a distributed programming model  Data-flow (computation as vertex, data flow.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.
TECHNIQUES FOR REDUCING CONSISTENCY- RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel.
TreadMarks Presented By: Jason Robey. Cool pic from last semester.
Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara.
A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang.
1 Lecture 13: LRC & Interconnection Networks Topics: LRC implementation, interconnection characteristics.
An Efficient Lock Protocol for Home-based Lazy Release Consistency Electronics and Telecommunications Research Institute (ETRI) 2001/5/16 HeeChul Yun.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
Distributed Shared Memory Based on Reference paper: Distributed Shared Memory, Concepts and Systems.
Distributed Memory and Cache Consistency (some slides courtesy of Alvin Lebeck)
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Distributed Shared Memory Presentation by Deepthi Reddy.
Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.
Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.
DISTRIBUTED COMPUTING
Page 1 Distributed Shared Memory Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.
A Design of User-Level Distributed Shared Memory Zhi Zhai Feng Shen Computer Science and Engineering University of Notre Dame Oct. 27, 2009 Progress Report.
TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems Present By: Blair Fort Oct. 28, 2004.
Introduction to Software Distributed Shared Memory Systems Chang-Yi Lin 2004 / 02 / 26.
1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.
Making a DSM Consistency Protocol Hierarchy-Aware: An Efficient Synchronization Scheme Gabriel Antoniu, Luc Bougé, Sébastien Lacour IRISA / INRIA & ENS.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik.
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.
August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems Isaac Gelado, Javier Cabezas. John Stone, Sanjay Patel, Nacho Navarro.
Distributed Memory and Cache Consistency (some slides courtesy of Alvin Lebeck)
Software Coherence Management on Non-Coherent-Cache Multicores
Distributed Shared Memory
Relaxed Consistency models and software distributed memory
Reactive Synchronization Algorithms for Multiprocessors
Chapter 10 Distributed Shared Memory
Implementing Simplified Molecular Dynamics Simulation in Different Parallel Paradigms Chao Mei April 27th, 2006 CS498LVK.
Pete Keleher, Alan L. Cox, Sandhya Dwarkadas and Willy Zwaenepoel
Distributed Shared Memory
An Integrated Synchronization and Consistency Protocol for the Implementation of a High-Level Parallel Programming Language Martin C. Rinard University.
Outline Midterm results summary Distributed file systems – continued
Implementing an OpenMP Execution Environment on InfiniBand Clusters
Exercises for Chapter 16: Distributed Shared Memory
Parallel Algorithm Models
A Novel Home Migration Protocol in Home-based DSM
Presentation transcript:

Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix Conference Jun Lee

DSM (distributed shared memory)  A software system for parallel computation Shares distributed memories Easier programming −Provide a single global address space

DSM (distributed shared memory)  No widely available DSM implementations In-house research platforms Kernel modifications Poor performance −Imitating consistency protocols of hardware −False sharing

Treadmarks  Objectives Commercially available workstations and OS −Standard Unix system on DECstation Efficient user-level DSM implementation −Reduce communication overhead  Design LRC (lazy release consistency) Multiple writer protocol Lazy diff creation

Consistency protocol (SC)  Sequential Consistency Every write visible “immediately” Single writer P0P1 R(a):0 W(a):1 R(a):1

Consistency protocol (SC)  Sequential Consistency Every write visible “immediately” Single writer P0P1 R(a):0 W(a):1 R(a):? R(a):1 Big problem with page size granularity

Page X Consistency protocol (SC)  Sequential Consistency Every write visible “immediately” Single writer W(x0):a W(x1):b a W(x2):c W(x3):d P0P1 a Page X bbccd False sharing

Consistency protocol (RC)  Release Consistency Relaxed memory consistency model −delay making its changes visible to other processors until certain synchronization accesses occurs Synchronization points −Acquire(), Release() (similar to locks, barriers) Two types −ERC (eager), LRC (lazy)

Consistency protocol (RC)  Release Consistency Acquire() and release() are sequentially consistent −Release() is performed after all previous operations have completed −Operations are performed after previous acquire() have been performed Acquire() and release() pair between conflicting accesses −SC and RC produce the same results.

Consistency protocol (RC)  ERC Write information is delivered at the release point P0P1 Acquire(L) R(a):0 W(a):1 Release(L) Acquire(L) R(a):? Release(L) R(a):1 Write Notice

Consistency protocol (RC)  ERC Write information is delivered at the release point P0P1 Acquire(L) R(a):0 W(a):1 Release(L) Acquire(L) R(a):1 Release(L) Acquire(K) Release(K)

Consistency protocol (RC)  LRC The delivery is postponed until the acquire Fewer messages than ERC P0P1 Acquire(L) R(a):0 W(a):1 Release(L) Acquire(L) R(a):? Release(L) R(a):1

Consistency protocol (RC)  ERC vs. LRC P0P1 Acquire(L) R(a):0 W(a):1 Release(L) P2 R(a):0 P3 R(a):0 Acquire(L) Release(L) R(a):1 ERC

Consistency protocol (RC)  ERC vs. LRC P0P1 Acquire(L) R(a):0 W(a):1 Release(L) P2 R(a):0 P3 R(a):0 Acquire(L) Release(L) R(a):1 ERC P0P1 Acquire(L) R(a):0 W(a):1 Release(L) P2 R(a):0 P3 R(a):0 Acquire(L) Release(L) R(a):1 LRC

Page X Multiple writer protocol Page X W(x0):a W(x1):b W(x2):c W(x3):? P0P1 abc Acquire(L) Release(L) Acquire(L) ac W(x0):a W(x1):b W(x2):c W(x3):d P0 P1 a Page X bc abcd False sharing

Page X Multiple writer protocol Page X W(x0):a W(x1):b W(x2):c W(x3):? P0P1 abcd W(x0):a W(x1):b W(x2):c W(x3):d P0 P1 a Page X bc abcd False sharing Acquire(L) Release(L) Acquire(L) Release(L) W(x3):d ac

Page X Twin and Diff Page X W(x0):a W(x1):b W(x2):c W(x3):? P0P1 abcd Acquire(L) Release(L) Acquire(L) Release(L) W(x3):d Twin X Diff ac diff ac

Page X Twin and Diff Page X W(x0):a W(x1):b W(x2):c W(x3):? P0P1 abcd Acquire(L) Release(L) Acquire(L) Release(L) W(x3):d Twin X Diff ac diff ac Twin X b ac Diff b diff ac interval

Implementation

Etc.  Lock & barrier Statically assigned manager  Garbage collection reclaim the space used by write notice records, interval records, and diffs Triggered when the free space drops below a threshold

Evaluation  Experimental Environment 8 DECstation-5000/240 connected to a 100-Mbps ATM LAN and a 10-Mbps Ethernet  Applications Water – molecular dynamics simulation Jacobi – Successive Over-Relaxation TSP – branch & bound algorithm to solve the traveling salesman problem Quicksort – using bubblesort to sort subarray of less than 1K element ILINK – genetic linkage analysis

Evaluation Speedup Execution statistics

Evaluation Execution time breakdown

Evaluation Unix overhead breakdownTreadMarks overhead breakdown

Evaluation Execution time breakdown for Water

Evaluation ERC vs. LRC Speedup Message rate

Evaluation ERC vs. LRC Data rate Diff creation rate

Implementation... pages time stamp pages time stamp 000 Acq(L) P0 sideP1 side P P

Implementation... pages time stamp pages time stamp 000 Acq(L) P0 sideP1 side W(a) Rel(L) a W(b) b twin P0 W.N P1 W.N P P

Implementation... pages time stamp pages time stamp 000 Acq(L) P0 sideP1 side W(a) Rel(L) a Acq(L) W(b) b twin P0 W.N P1 W.N

Implementation... pages time stamp pages time stamp 000 Acq(L) P0 sideP1 side W(a) Rel(L) a Acq(L) W(b) b twin P0 W.N P0 100 P1 W.N

Implementation... pages time stamp pages time stamp 110 Acq(L) P0 sideP1 side W(a) Rel(L) a Acq(L) W(c) W(b) b twin P0 W.N P0 W.N P1 b diff a P a W.N P0P1 100

Implementation... pages time stamp 200 Acq(L) P0 sideP1 side W(a) Rel(L) a Acq(L) W(c) Rel(L) W(b) P0 W.N a diff P... pages time stamp 110 b P0 W.N P1 b diff a a W.N P0P1 100 c W.N P twin b a

Implementation... pages time stamp 200 Acq(L) P0 sideP1 side W(a) Rel(L) a Acq(L) W(c) Rel(L) W(b) P0 W.N a diff P... pages time stamp 120 b P0 W.N P1 b diff a a W.N P0P1 100 c W.N twin b a

Implementation P1 sideP2 side P0P pages time stamp 000 P Acq(L)... pages time stamp 120 b P0 W.N P1 b diff a a W.N c P1 W.N twin b a

Implementation P1 sideP2 side P0P pages time stamp 111 P Acq(L)... pages time stamp 120 b P0 W.N P1 b diff a a W.N c P1 W.N twin b a P0 W.N P1 W.N P1 W.N

Thank you ! Any questions?