Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Slides:



Advertisements
Similar presentations
The Effect of Network Total Order, Broadcast, and Remote-Write on Network- Based Shared Memory Computing Robert Stets, Sandhya Dwarkadas, Leonidas Kontothanassis,
Advertisements

Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.
ECE 454 Computer Systems Programming Parallel Architectures and Performance Implications (II) Ding Yuan ECE Dept., University of Toronto
Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
Presented by Evan Yang. Overview of Munin  Distributed shared memory (DSM) system  Unique features Multiple consistency protocols Release consistency.
Distributed Shared Memory
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 25: Distributed Shared Memory All slides © IG.
Distributed Operating Systems CS551 Colorado State University at Lockheed-Martin Lecture 4 -- Spring 2001.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
BusMultis.1 Review: Where are We Now? Processor Control Datapath Memory Input Output Input Output Memory Processor Control Datapath  Multiprocessor –
Distributed Resource Management: Distributed Shared Memory
ECE669 L18: Scalable Parallel Caches April 6, 2004 ECE 669 Parallel Computer Architecture Lecture 18 Scalable Parallel Caches.
1 CSE SUNY New Paltz Chapter Nine Multiprocessors.
Consistency. Consistency model: –A constraint on the system state observable by applications Examples: –Local/disk memory : –Database: What is consistency?
CSS434 DSM1 CSS434 Distributed Shared Memory Textbook Ch18 Professor: Munehiro Fukuda.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed Shared Memory.
Multiprocessor Cache Coherency
Spring 2003CSE P5481 Cache Coherency Cache coherent processors reading processor must get the most current value most current value is the last write Cache.
Distributed Shared Memory Systems and Programming
TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.
Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.
1 Cache coherence CEG 4131 Computer Architecture III Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini.
Distributed Shared Memory (DSM)
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed Shared Memory Steve Ko Computer Sciences and Engineering University at Buffalo.
B. Prabhakaran 1 Distributed Shared Memory DSM provides a virtual address space that is shared among all nodes in the distributed system. Programs access.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
TECHNIQUES FOR REDUCING CONSISTENCY- RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel.
Memory Coherence in Shared Virtual Memory System ACM Transactions on Computer Science(TOCS), 1989 KAI LI Princeton University PAUL HUDAK Yale University.
CS425/CSE424/ECE428 – Distributed Systems Nikita Borisov - UIUC1 Some material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra,
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
Ch 10 Shared memory via message passing Problems –Explicit user action needed –Address spaces are distinct –Small Granularity of Transfer Distributed Shared.
Distributed Shared Memory Based on Reference paper: Distributed Shared Memory, Concepts and Systems.
Distributed Shared Memory Presentation by Deepthi Reddy.
Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.
DISTRIBUTED COMPUTING
Page 1 Distributed Shared Memory Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.
TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems Present By: Blair Fort Oct. 28, 2004.
ECE 1747: Parallel Programming Basics of Parallel Architectures: Shared-Memory Machines.
CIS 720 Distributed Shared Memory. Shared Memory Shared memory programs are easier to write Multiprocessor systems Message passing systems: - no physically.
Memory Coherence in Shared Virtual Memory Systems Yeong Ouk Kim, Hyun Gi Ahn.
1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 March 20, 2008 Session 9.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik.
Memory Coherence in Shared Virtual Memory System ACM Transactions on Computer Science(TOCS), 1989 KAI LI Princeton University PAUL HUDAK Yale University.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 7, 2005 Session 23.
Lecture 28-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) December 2, 2010 Lecture 28 Distributed.
Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.
Distributed Shared Memory
Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013
Ivy Eva Wu.
Chapter 10 Distributed Shared Memory
CMSC 611: Advanced Computer Architecture
Lecture 26 A: Distributed Shared Memory
Outline Midterm results summary Distributed file systems – continued
ECE1747 Parallel Programming
Multiprocessors - Flynn’s taxonomy (1966)
Distributed Shared Memory
CSS490 Distributed Shared Memory
Lecture 25: Multiprocessors
Lecture 25: Multiprocessors
Lecture 26 A: Distributed Shared Memory
Lecture 24: Multiprocessors
Lecture: Coherence Topics: wrap-up of snooping-based coherence,
Distributed Resource Management: Distributed Shared Memory
Presentation transcript:

Distributed Shared Memory (part 1)

Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory

Shared memory programming Standard – pthread synchronizations –Barriers –Locks –Semaphores

Sequential SOR for some number of timesteps/iterations { for (i=0; i<n; i++ ) for( j=1, j<n, j++ ) temp[i][j] = 0.25 * ( grid[i-1][j] + grid[i+1][j] grid[i][j-1] + grid[i][j+1] ); for( i=0; i<n; i++ ) for( j=1; j<n; j++ ) grid[i][j] = temp[i][j]; }

Parallel SOR with Barriers (1 of 2) void* sor (void* arg) { int slice = (int)arg; int from = (slice * (n-1))/p + 1; int to = ((slice+1) * (n-1))/p + 1; for some number of iterations { … } }

Parallel SOR with Barriers (2 of 2) for (i=from; i<to; i++) for (j=1; j<n; j++) temp[i][j] = 0.25 * (grid[i-1][j] + grid[i+1][j] + grid[i][j-1] + grid[i][j+1]); barrier(); for (i=from; i<to; i++) for (j=1; j<n; j++) grid[i][j]=temp[i][j]; barrier();

Differences between SMP and Software DSM Delay: tradeoffs, such as block size Software => traps: cost of read/write misses Goals of caches: multiprocessor = performance, dist. system = transparency bus vs. long networks: reliance on serialization and broadcast.

Consequent differences in protocols and applications Bigger block size –Cost amortization, higher hit ratio for larger blocks? –Reduced overhead But therefore... –Migration vs. Replication –False sharing increases DSM protocol more complex: Must handle lost, corrupted, and out-of-order packets Above, coupled with cost of traps, => SDSM consistency cost much higher!

Results of high consistency costs Manage sharing more carefully Align data to page boundaries

Consistency Models Sequential Consistency –All processors observe the same order –Must correspond to some serial order –Only ordering constraint is that reads/writes of P1 appear in the same order, but no restrictions on relative ordering between processors.

Common consistency protocols Write update –Multicast update to all replicas Write invalidate –Invalidate cached copies in p2, p3 –Cache miss if p2/p3 access X Valid data from other cache

Conventional Implementation As proposed by Li & Hudak, TOCS ‘86. Use virtual memory to implement sharing. Shared memory divided up by virtual memory pages. Use single-writer, multiple-reader write- invalidate coherence protocol. Keep pages in one of three states: –invalid, read-only, read-write

Example proc0proc1proc2procN shared memory

Example: Read Access Hit proc0proc1proc2procN read

Example: Write Access Hit proc0proc1proc2procN write

Example: Read Access Miss proc0proc1proc2procN read

Example: Read Fault proc0proc1proc2procN read fault

Example: Replication on Read proc0proc1proc2procN read

Example: Write Access Miss proc0proc1proc2procN write

Example: Write Fault proc0proc1proc2procN write fault

Example: Write Invalidation proc0proc1proc2procN write

Example: Write Access to Read-Only proc0proc1proc2procN write

Example: Write Fault proc0proc1proc2procN write fault

Example: Write Invalidation proc0proc1proc2procN write

How to Remember Locations? Broadcast on miss (as in SMP). Static home. Dynamic home or owner.

Ownership and Owner Location Owner is the last writer. Owner maintains copyset. Every processor maintains probable owner (not always the real owner).

Ownership Location Every read or write miss is sent to (local) probable owner. If owner, handle appropriately, else forward to probable owner.

Ownership Modification If write miss, new writer becomes owner, and all forwarders set probable owner to requester. If read miss, set probable owner to responding processor.

Example Initially, owner(page0) = p0, and probable owner(page0) = p0 everywhere. Write miss by p1, sends message to its probable owner (p0), handled there, new owner = p1, probable owner(0) on p0 = 1. Read miss by p2, sends message to probable owner (p0), forwarded to probable owner (p1), handled there, probable owner(0) on p2 becomes p1.

Implement synchronizations Use messages to implement synchronizations

Barriers Designate one processor as barrier manager. When a process waits at a barrier, it sends an arrival message to the barrier manager and waits. When barrier manager has received all messages, it sends a departure message to all processes.

Locks Designate one process as the lock manager for a particular lock. When a process acquires a lock, it sends an acquire message to the manager and waits. Manager forwards message to last acquirer. If lock free, send lock grant message. If lock held, hold on to request until free, and then send lock grant message.

Problem: False Sharing Concurrent access to different data within the same consistency unit. With page as consistency unit, lots of opportunity for false sharing. Two flavors: –read-write –write-write

Read-Write False Sharing x y

Read-Write False Sharing (Cont.) w(x) r(y) r(x) synch w(x)

Read-Write False Sharing (Cont.) w(x) r(y) r(x) synch w(x)

Write-Write False Sharing w(x) w(y) r(x) synch w(x)

Summary Software shared memory on distributed memory hardware. –Uses virtual memory. Home migration to improve locality –important because of high latencies. Sequential consistency suffers from false sharing