TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.

Slides:



Advertisements
Similar presentations
The Effect of Network Total Order, Broadcast, and Remote-Write on Network- Based Shared Memory Computing Robert Stets, Sandhya Dwarkadas, Leonidas Kontothanassis,
Advertisements

More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
The University of Adelaide, School of Computer Science
Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.
Parallel and Distributed Simulation Global Virtual Time - Part 2.
Multiple-Writer Distributed Memory. The Sequential Consistency Memory Model P1P2 P3 switch randomly set after each memory op ensures some serial order.
Consistency and Replication (3). Topics Consistency protocols.
1 Causality. 2 The “happens before” relation happens before (causes)
1 Munin, Clouds and Treadmarks Distributed Shared Memory course Taken from a presentation of: Maya Maimon (University of Haifa, Israel).
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
November 1, 2005Sebastian Niezgoda TreadMarks Sebastian Niezgoda.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CS 582 / CMPE 481 Distributed Systems
Lightweight Logging For Lazy Release Consistent DSM Costa, et. al. CS /01/01.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
Memory consistency models Presented by: Gabriel Tanase.
CS 582 / CMPE 481 Distributed Systems Replication.
Distributed Resource Management: Distributed Shared Memory
Memory Consistency Models
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Distributed Shared Memory Systems and Programming
Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed Shared Memory Steve Ko Computer Sciences and Engineering University at Buffalo.
TECHNIQUES FOR REDUCING CONSISTENCY- RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel.
TreadMarks Presented By: Jason Robey. Cool pic from last semester.
Performance of the Shasta distributed shared memory protocol Daniel J. Scales Kourosh Gharachorloo 創造情報学専攻 M グェン トアン ドゥク.
Winter, 2004CSS490 Synchronization1 Textbook Ch6 Instructor: Munehiro Fukuda These slides were compiled from the textbook, the reference books, and the.
Lamport’s Logical Clocks & Totally Ordered Multicasting.
1 Lecture 13: LRC & Interconnection Networks Topics: LRC implementation, interconnection characteristics.
An Efficient Lock Protocol for Home-based Lazy Release Consistency Electronics and Telecommunications Research Institute (ETRI) 2001/5/16 HeeChul Yun.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
Ch 10 Shared memory via message passing Problems –Explicit user action needed –Address spaces are distinct –Small Granularity of Transfer Distributed Shared.
Distributed Memory and Cache Consistency (some slides courtesy of Alvin Lebeck)
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Distributed Shared Memory Presentation by Deepthi Reddy.
Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.
Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Event Ordering Greg Bilodeau CS 5204 November 3, 2009.
Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.
Page 1 Distributed Shared Memory Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.
A Design of User-Level Distributed Shared Memory Zhi Zhai Feng Shen Computer Science and Engineering University of Notre Dame Oct. 27, 2009 Progress Report.
TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems Present By: Blair Fort Oct. 28, 2004.
1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 5 Instructor: Haifeng YU.
Distributed Systems Topic 5: Time, Coordination and Agreement
Making a DSM Consistency Protocol Hierarchy-Aware: An Efficient Synchronization Scheme Gabriel Antoniu, Luc Bougé, Sébastien Lacour IRISA / INRIA & ENS.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik.
Memory Coherence in Shared Virtual Memory System ACM Transactions on Computer Science(TOCS), 1989 KAI LI Princeton University PAUL HUDAK Yale University.
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
Logical Clocks. Topics r Logical clocks r Totally-Ordered Multicasting.
Event Ordering. CS 5204 – Operating Systems2 Time and Ordering The two critical differences between centralized and distributed systems are: absence of.
Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.
Distributed Memory and Cache Consistency (some slides courtesy of Alvin Lebeck)
Distributed Shared Memory
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Parallel and Distributed Simulation Techniques
Pete Keleher, Alan L. Cox, Sandhya Dwarkadas and Willy Zwaenepoel
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Multiprocessors - Flynn’s taxonomy (1966)
Distributed Systems CS
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Distributed Resource Management: Distributed Shared Memory
Lecture 17 Multiprocessors and Thread-Level Parallelism
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel

Agenda  DSM Overview  TreadMarks Overview  Vector Clocks  Multi-writer Protocol (diffs)  TreadMarks Algorithm  Implementation  Limitations

DSM Overview  Global address space virtualization of disparate physical memory  Program using normal thread/locking techniques (no MPI) Proc Mem Proc Mem Proc Mem Proc Mem

DSM Overview  Communication overhead incurred to synchronize memory  Maximize parallel computation and limit communication to improve performance Proc Mem Proc Mem Proc Mem Proc Mem

TreadMarks Overview  Minimize communications to improve DSM performance Lazy Release Consistency (Vector Clocks) Multiple Writers (Lazy Diff Creation)  Delay communication as long as possible (possibly even avoid)

TreadMarks Overview Release Consistency  Release Consistency: Shared memory updates must be visible when the release is visible No need to send updates immediately upon write P1P1 P2P2 w(x)

TreadMarks Overview Lazy Release Consistency  Lazy Release Consistency: Shared memory updates are not made visible until the time of acquire No update propagated if update never acquired P1P1 P2P2 w(x)

Vector Clocks  Global clock mechanism for identifying causal ordering of events in distributed systems Mattern (1989) and Fidge (1991) P1P1 P2P2 P3P3

Vector Clocks  Each process maintains a vector of counters One for each process in the system P1P1 P2P2 P3P

Vector Clocks  Each process maintains a vector of counters One for each process in the system P1P1 P2P2 P3P

Vector Clocks  Increments own counter upon Local Event P1P1 P2P2 P3P

Vector Clocks  Increments own counter upon Local Event P1P1 P2P2 P3P

Vector Clocks  Increments own counter and updates all other counters upon Receiving Message P1P1 P2P2 P3P

Vector Clocks  Increments own counter and updates all other counters upon Receiving Message P1P1 P2P2 P3P

Diff Creation  Retains copy of page upon first writing P2P2 P1P1

Diff Creation  Retains copy of page upon first writing P2P2 P1P1

Diff Creation  Create diff by comparing modified page against original (RLC) P2P2 P1P1

Diff Creation  Send diff to other processes P2P2 P1P1

Lazy Diff Creation  Diffs created only when a page is invalidated  Or the modifications are requested explicitly access miss on invalidated page P2P2 P1P1

TreadMarks Algorithm  P 1 Cannot proceed past acquire until: All modifications have been received from processes whose vector timestamps are smaller P 1 ’s P1P1 P3P

TreadMarks Algorithm  On acquire: P 1 Sends Vector Timestamp to releaser P1P1 P3P

TreadMarks Algorithm  On acquire: P 1 Sends Vector Timestamp to releaser P 2 Attaches invalidations for all updated counters P1P1 P3P invalidate

TreadMarks Algorithm  On acquire: P 1 Sends Vector Timestamp to releaser P 2 Attaches invalidations for all updated counters P 2 Sends updated Vector Timestamp with invalidations P1P1 P3P invalidate

TreadMarks Algorithm  Diffs generated when: Receiving invalidation (i.e. P 1 had made prior updates to this page also) Page is accessed (miss) P1P1 P3P invalidate diff w(x)

TreadMarks Implementation Data Structures Page array page 1 2 proc_id Write notice record Diff pool Proc array 1 Interval* record *VC counter

TreadMarks Implementation Locks  Each lock is statically assigned a manager (RR) Keeps track of processors  Lock acquires are sent to manager (forwarded to last processor to obtain lock)  Upon release, sends (for each interval): Processor ID and Vector Timestamp Any invalidations that are necessary

TreadMarks Implementation Barriers  Centralized barrier Manager  Upon arrival at barrier: Notifies Manager of intervals that the manager does not already have Incorporated when Manager arrives at barrier  When all clients have arrived: Manager notifies all clients of intervals they do not already have  Expensive

Limitations  Achieved nearly linear speedup for TSP, Jacobi, Quicksort, ILINK algorithms  Water: Each molecule in simulation is protected by lock and frequently accessed Barriers used in synchronization Speedup is limited by low computation to communication ratio of algorithm (many fine-grained messages)

Limitations  TSP: Eager Release Consistency performs better than Lazy Release Consistency (Fig. 9) Updates occur on invalidation and access misses (writes/synchronization points) TSP algorithm reads stale ‘current minimum’ value without synchronization

Limitations  Depends on events (write/synchronization) to trigger consistency operations  More opportunities to read stale data (TSP)  Reduced redundancy increases risk of data loss

Summary  Improves performance by improving computation to communication ratio  Delay consistency updates until page access is acquired  Weaker consistency implies greater likelihood of reading stale data and data loss  Procrastination = Performance