Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik.

Similar presentations


Presentation on theme: "Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik."— Presentation transcript:

1 Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik

2 Lazy Release consistency Problem: To reduce both the number of messages and the amount of data exchanged for remote memory accesses. Importance: These reductions are important for the programs that exhibit false sharing and make extensive use of locks.

3 Overview Software DSM Release Consistency Eager Release Consistency Lazy Release Consistency Simulations Conclusion Future Work

4 Software DSM Software DSM is a runtime system that provides the shared address space abstraction across a message-passing based cluster of computers Rely on (user level) memory management techniques to detect access/updates to shared data High Communication overheads and Large page-size coherence units Sending messages expensive in Software DSM

5 Pipelining Remote Memory Accesses in DASH Dash implementation of RC combat memory latency by pipelining writes to shared memory. The processor is stalled only when executing a release, at that time it must wait for all its previous writes to perform.

6 Problem with DASH Pipelining of writes in DASH increases the messages passing thru the network. So Munin’s write-shared protocol, a software implementation of RC buffers writes until a release instead of pipelining them.

7 Merging of Remote Memory Updates in Munin. At the release all writes going to the same destination are merged into a single message. Even Munin’s write shared protocol may send more messages than a message passing implementation of the same application.

8 RC – Formal Definition A system is release consistent if  Before an ordinary access is allowed to perform with respect to any other processor, all previous acquires must be performed  Before a release is allowed to perform with respect to any other processor, all the previous reads and writes must be performed.  Special accesses are sequentially consistent with each other.

9 Eager Release Consistency (based on Munin’s write share protocol) ERC  Modification are propagated at release. Invalidate Protocol  Sends invalidations for all the modified pages to the other processors that cache these pages. Update Protocol  Sends a diff of each modified page to other cachers and then merged.  Diffs – limit the amount of data exchanged

10 Eager Release Consistency (..Contd) Acquire  No consistency related operations  Protocol locates the processor that last executed a release on the same variable Access Miss  Message to directory manager.  Directory manager forwards request to current owner

11 Repeated Updates of Cached copies in Eager RC In above figure processors P1 through P4 repeatedly acquire the lock l, write the shared variable x, and then release l. If an update policy is used in conjunction with Munin’s writeshared protocol, and x is present in all caches then all of these cache copies are updated at every release.

12 Lazy Release Consistency Rather than eagerly “sync up” data at release point, LRC “lazily” waits until the subsequent acquire. Propagation of modifications postponed until the time of an acquire.

13 Lazy Release consistency At this time the acquiring processor determines which modifications it needs to see according to the definition of RC. To do this LRC uses a representations of the happened before-1 partial order introduced by Adve and Hill.

14 happened-before-1 Partial Order Shared memory accesses are partially ordered by happened-before-1, denoted by, defined as follows:  If a1 and a2 are accesses on the same processor, and a1 occurs before a2 in program order, then a1 a2  If a1 is a release on processor p1, and a2 is an acquire on the same location on processor p2, and a2 returns the value written by a1, then a1 a2  If a1 a2, a2 a3, then a1 a3. hb1

15 Write Notices A write notice is an indication that a page has been modified in a particular interval, but it doesn't contain the actual modifications. LRC – Guaranteed by write notices

16 Write Notice Propagation Execution of each processor is divided into intervals Interval performed at a processor  All modifications during that interval have been performed at the processor V p (i)  Vector Timestamp for interval i and processor p. Number of elements in V p (i) = Number of processors Entry for p in V p (i) = i Entry for q in V p (i) = Most recent interval of q performed at p

17 Write Notice Propagation V p1 (i p1 ) = { i p1, 0, 0, 0} V p2 (i p2 ) = {i p1, i p2, 0, 0} V p3 (i p3 ) = {0, i p2, i p3, 0} V p4 (i p4 ) = {0, 0, i p3, i p4 } P1 P2 P3 P4 w(x) rel acq w(x) rel acq r(x) i p1 i P2 i p3 i p4

18 Data Movement Protocols Multiple Writer Protocol - allows multiple processors write to different parts of the same page concurrently without intervening of the same page concurrently. False Sharing  Occurs when two or more processors access different variables within a page, with at least one of the accesses being a write.  Generates large amount of message traffic DASH – Exclusive-write protocols. LRC – Multiple writer protocols.  Allows to write into falsely shared pages.  Modifications merged using diffs.  Message traffic is reduced.

19 Invalidate Vs Update Invalidate  Acquiring processor invalidates all pages in its cache for which it receives write notices. Update  Updates the pages for which it receives write notices.  Diffs must be obtained for all concurrent modifiers.  For interval i, diffs must be obtained from all intervals j, such that, j i, and there exists no k such that j k i, in which the modifications from j was overwritten. hb1

20 EI Vs LI

21 EU Vs LU

22 Access Misses Copy of page as well as a number of diffs may have to be retrieved. Modifications summarized by diffs are merged before access. Access Miss:  At interval i, diffs must be obtained from all intervals j, such that, j i, and there exists no interval k, such that j k i If processor has an invalidated copy of page  Write-notices contain all the necessary information to determine which diffs need to be applied.  Reduces the amount of data sent. hb1

23 Simulation Simulation study was done based on the multiprocessor traces of five shared-memory application programs from the SPLASH suite. Measured the number of messages and the amount of data exchanged by each program for an execution using four proposed protocols: Lazy Update (LU), Lazy Invalidate (LI) Eager Update (EI), Eager Invalidate (EI). Methodology: A trace was generated from a 32-processor execution of each program using the Tango multiprocessor simulator.These traces were then fed into our protocol simulator and simulated page sizes from 512 to 892 bytes. They assumed infinite caches and reliable FIFO communication channel but didn’t assume any broadcast or multicast capability of N/W

24 Shared memory operation Message costs M = #concurrent last modifiers for the missing page H = #other concurrent last modifiers for any local page C = #other cachers of the page N = #processors in system P = #pages in system U = ∑ n i=1 (# other cachers of pages modified by i) V = ∑ n i=1 (# excess invalidations of page i)

25 SPLASH Program Suite For simulating the four protocols LU,LI,EU,EI five SPLASH bench mark programs are taken into consideration. Locus Route Cholesky Factorization MP3D Water Pthor using these all programs, they have compared the data and message exchanged when using the four different protocol and observed the results.

26 LocusRoute Synchronization is dominated by locks

27 Cholesky Factorization Synchronization is dominated by locks

28 Pthor Synchronization is dominated by locks

29 MP3D Synchronization is dominated by barriers

30 Water Synchronization is dominated by barriers

31 Eager Vs Lazy

32 Conclusion Performance of Software DSM – Sensitive to the number of messages and amount of data exchanged to create shared memory abstraction. LRC aims at reducing both the number of messages and amount of data exchanged by allowing changes to propagate lazily, only when needed.

33 Future Work We can expect the future work is an implementation of lazy release consistency to estimate the runtime cost of the algorithm.

34 References www-cse.ucsd.edu/classes/fa99/cse221/OSSurveyF99/ papers http://www.cs.rochester.edu/research/cashmere/SC95/lazeag.html


Download ppt "Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik."

Similar presentations


Ads by Google