Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.

Slides:



Advertisements
Similar presentations
The University of Adelaide, School of Computer Science
Advertisements

COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 7: Consistency 4/13/20151Distributed Systems - COMP 655.
SE-292 High Performance Computing
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
Multiple-Writer Distributed Memory. The Sequential Consistency Memory Model P1P2 P3 switch randomly set after each memory op ensures some serial order.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
Distributed Shared Memory
Consistency Models Based on Tanenbaum/van Steen’s “Distributed Systems”, Ch. 6, section 6.2.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
November 1, 2005Sebastian Niezgoda TreadMarks Sebastian Niezgoda.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 25: Distributed Shared Memory All slides © IG.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Computer Science Lecture 14, page 1 CS677: Distributed OS Consistency and Replication Today: –Introduction –Consistency models Data-centric consistency.
Lightweight Logging For Lazy Release Consistent DSM Costa, et. al. CS /01/01.
Memory consistency models Presented by: Gabriel Tanase.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Distributed Resource Management: Distributed Shared Memory
Memory Consistency Models
Consistency. Consistency model: –A constraint on the system state observable by applications Examples: –Local/disk memory : –Database: What is consistency?
Distributed Shared Memory and Sequential Consistency.
Distributed Shared Memory Systems and Programming
Consistency and Replication CSCI 4780/6780. Chapter Outline Why replication? –Relations to reliability and scalability How to maintain consistency of.
TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.
Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed Shared Memory Steve Ko Computer Sciences and Engineering University at Buffalo.
TECHNIQUES FOR REDUCING CONSISTENCY- RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel.
An Efficient Lock Protocol for Home-based Lazy Release Consistency Electronics and Telecommunications Research Institute (ETRI) 2001/5/16 HeeChul Yun.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
Ch 10 Shared memory via message passing Problems –Explicit user action needed –Address spaces are distinct –Small Granularity of Transfer Distributed Shared.
Distributed Memory and Cache Consistency (some slides courtesy of Alvin Lebeck)
Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.
Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451.
DISTRIBUTED COMPUTING
Page 1 Distributed Shared Memory Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.
TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems Present By: Blair Fort Oct. 28, 2004.
1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik.
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.
Consistency and Replication CSCI 6900/4900. FIFO Consistency Relaxes the constraints of the causal consistency “Writes done by a single process are seen.
Distributed Memory and Cache Consistency (some slides courtesy of Alvin Lebeck)
Software Coherence Management on Non-Coherent-Cache Multicores
Distributed Shared Memory
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Memory Consistency Models
Ivy Eva Wu.
Pete Keleher, Alan L. Cox, Sandhya Dwarkadas and Willy Zwaenepoel
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
The University of Adelaide, School of Computer Science
Consistency Models.
Lecture 26 A: Distributed Shared Memory
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
Distributed Shared Memory
Distributed Systems CS
The University of Adelaide, School of Computer Science
Lecture 26 A: Distributed Shared Memory
Lecture 17 Multiprocessors and Thread-Level Parallelism
Distributed Resource Management: Distributed Shared Memory
Lecture 17 Multiprocessors and Thread-Level Parallelism
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Relaxed Consistency Models

Outline Lazy Release Consistency TreadMarks DSM system

Review: what makes a good consistency model? Model is a contract between memory system and programmer ◦Programmer follows some rules about reads and writes ◦Model provides guarantees Model embodies a tradeoff ◦Intuitive for programmer vs. Can be implemented efficiently

Treadmarks high level goals Better DSM performance Run existing parallel (and “correct”) code.

What specific problems with IVY are TreadMarks want to fix? False sharing: two machines use different variables on the same page, at least on writes ◦IVY will make the pages bouncing back and forth ◦However, it doesn’t need to do so of two process (threads) working on different variables. send only written bytes – not whole pages

Goal 1: Reducing the data to be sent Goal: don’t send while page, just the written bytes. On M1 write fault: ◦tell other hosts to invalidate but keep hidden copy. ◦M1 itself also keep the hidden copy. On M2 fault: ◦M2 asks M1 for recent modifications. ◦M1 “diffs” current page against hidden copy. ◦M1 send differences to M2. ◦M2 applies diffs to its hidden copy and make the up-to-date version

Goal 2: allow multiple readers+writers To cope with false sharing ◦no invalidation when a machine writes ◦no r/w  r/o demotion when a machines reads ◦so, there will be multiple “different” copies of a page! which should a reader look at? Diffs help here: can merge writes to same page But, when to send the diffs? ◦No invalidations,  no page faults,  what triggers sending diffs?

Release Consistency Think about how you program your multi-thread codes. While accessing the shared data, you should first get a lock and then accessing the data and final you have to release the lock. This is considered as the “correct” programming practice. In distributed environment, think about we have a lock server. Each process should get a lock from the lock server before accessing the shared resources Thus, we can send out write diffs on release to all copies of pages written. This is a new consistency model!

Release Consistency Model M0 wont see M1’s writes until M1 releases a lock so machines can temporarily disagree on memory contents If the programs always follow the rules of lock: ◦Locks force order  no stale reads  like sequential consistency But, if you do not follow this guideline (don’t lock) ◦reads can return stale data ◦concurrent writes to same variable  trouble (data race) Benefit? multiple machines can have copies of a page, even when 1 or more writes ◦no bouncing of pages due to false sharing ◦read copies can co-exist with writers ◦relies on write diffs otherwise can’t reconcile concurrent writes to same page

Lazy Release Consistency Model Do we really need to update the pages at moment of release a lock? Suppose you never use a variable which is updated by some processes in the system. You do not need to get notified by the update event for the variable. Only fetch write diffs on acquire of a lock and only fetch from previous holder of that lock. Thus nothing happens at time of write or release. This is called as Lazy Release Consistency Model (LRC) and is another new consistency model! LRC hides some writes that RC reveals. Benefit? ◦if you don’t acquire lock on object, you don’t have to fetch updates to it ◦if you use just some variables on a page, no need to fetch writes to others ◦less network traffic

Conventional DSM Implementation

Sequential vs Release Consistency Every Write is broadcasted More Message Passing Writes are broadcasted only synchronization points More Memory overhead

Read-Write False Sharing w(x) r(y) r(x) w(x)

Read-Write False Sharing w(x) r(y) r(x) synch

Write-Write False Sharing w(x) w(y) r(x) synch w(x)

Multiple-Writer False Sharing w(x) w(y) r(x) synch w(x)

Example 1 (false sharing) x and y are on the same page. (a: acquire, r: release) M0: a1 for (…) x++ r1 M1: a2 for (…) y++ r2 a1 print x, y r1 What does IVY do? What does Treadmarks do? ◦M0 and M1 both get cached writeable copy of the page ◦when they release, each computes diff against original page ◦M1’s a1 cause it to pull write diffs from last holder of lock1, so M1 update x in its page.

Example 2 (LRC) x and y on same page M0: a1 x=1 r1 M1: a2 y=1 r2 M2: a1 print x r1 What does IVY do? What does Treadmarks do? ◦M2 only ask previous holder of lock 1 for write diffs ◦M2 does not see M1’s modification to y, even though on the same page

Discussion Q: is LRC a win over IVY if each variable on a separate page? (No) Q: why is LRC a reasonably intuitive model for programmers? It is the same as sequential consistency if the programmers always use lock and unlock locks. (follow the rules defined by LRC) but, non-locking code does not work. like v=f(); done=1;

Example 3 (motivate vector timestamps) M0: a1 x=1 r1 M1: a1 a2 y=x r2 r1 M2: a2 print x, y r2 What’s the “right ” answer? ◦we need to define what LRC guaranetees ◦answer: when you acquire a lock, ◦you see all writes by previous holder and all writes previous holder saw

What does TreadMarks do for example 3? What does TreadMarks do? ◦M2 and M1 need to decide what M2 needs and doesn’t already have uses “vector timestamps” ◦each machine numbers its releases (i.e. write diffs) ◦M1 tells M2: ◦at release, had seen M0’s writes through #20, and see ◦0:20 ◦1:25 ◦2:19 ◦3:36 ◦…… ◦this is a “vector timestanmp” ◦M2 remembers a vector timestamp of writes it has seen ◦M2 compares with M1’s VT to see what writes it needs from other machines.

Discussions VTs order writes to same variable by different machines: ◦M0: a1 x=1 r1 a2 y=9 r2 ◦M1: a1 x=2 r1 ◦M2: a1 a2 z = x + y r2 r1 ◦M1 is going to hear “x=1” from M0, and “x=2” from M1. ◦How does M1 know what to do? Could the VTs for two values of the same variable not be ordered? M0: a1 x=1 r1 M1: a2 x=2 r2 M2: a1 a2 print x r2 r1

Programmer rules /system guarentees? Programmer must lock around all writes to shared variables to order writes to same variable, otherwise “latest value” not well defined to read latest value, must lock if no lock for read, guaranteed to see values that contributed to the variables you did lock

Example of when LRC might work too hard M0: a2 z=99 r2 a1 x=1 r1 M1: a1 y=x r1 TreadMarks will send z to M1 because it comes before x=1 in VT order. ◦Assuming x and z are on the same page. ◦Even if on different pages, M1 must invalidate z’s page. But M1 doesn’t use z How could a system understand that z isn’t needed? ◦Require locking of all data you read thus to relax the causal part of the LRC model

Q: without using VM page protection? It uses VM to ◦detect writes to avoid making hidden copies (for diffs) if not needed ◦detect reads to pages  know whether to fetch a diff neither is really crucial so TreadMarks doesn’t depend on VM as much as IVY does IVY used VM faults to decide what data has to be moved and when TM uses acquire()/release() and diffs for that purpose

TreadMarks Implementation Looks a lot like pthreads Implicit message passing Implicit process creation Only standard Unix System Calls ◦ Message Passing ◦ Memory Management

TreadMarks Code

Eager vs. Lazy RC Sends Messages at release of lock or at barriers Broadcasts Messages to all nodes Sends Messages when locks are acquired Message goes only to the required node

Eager vs. Lazy RC

Memory Consistency Done by creating diffs Eager RC creates diffs at barriers Lazy RC creates diffs at the first use of a page

Twin Creation

Diff Organization

Vector Timestamps w(x) rel acq w(y) rel p1 p2 p3 acq r(x) r(y)

Diff chain in Proc 4

Garbage Collection Used to merge all diffs – recover memory Occurs only at barriers All nodes that have a pages must have all diffs of that page.

DSM successful? clusters of cooperating machines are hugely successful DSM not so much ◦main justification is transparency for existing threaded code ◦that's not interesting for new apps ◦and transparency makes it hard to get high performance MapReduce or message-passing or shared storage more common than DSM

Thank You! Any Questions?