Presentation is loading. Please wait.

Presentation is loading. Please wait.

Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix.

Similar presentations


Presentation on theme: "Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix."— Presentation transcript:

1 Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix Conference 1994 2008-22952 Jun Lee

2 DSM (distributed shared memory)  A software system for parallel computation Shares distributed memories Easier programming −Provide a single global address space

3 DSM (distributed shared memory)  No widely available DSM implementations In-house research platforms Kernel modifications Poor performance −Imitating consistency protocols of hardware −False sharing

4 Treadmarks  Objectives Commercially available workstations and OS −Standard Unix system on DECstation Efficient user-level DSM implementation −Reduce communication overhead  Design LRC (lazy release consistency) Multiple writer protocol Lazy diff creation

5 Consistency protocol (SC)  Sequential Consistency Every write visible “immediately” Single writer P0P1 R(a):0 W(a):1 R(a):1

6 Consistency protocol (SC)  Sequential Consistency Every write visible “immediately” Single writer P0P1 R(a):0 W(a):1 R(a):? R(a):1 Big problem with page size granularity

7 Page X Consistency protocol (SC)  Sequential Consistency Every write visible “immediately” Single writer W(x0):a W(x1):b a W(x2):c W(x3):d P0P1 a Page X bbccd False sharing

8 Consistency protocol (RC)  Release Consistency Relaxed memory consistency model −delay making its changes visible to other processors until certain synchronization accesses occurs Synchronization points −Acquire(), Release() (similar to locks, barriers) Two types −ERC (eager), LRC (lazy)

9 Consistency protocol (RC)  Release Consistency Acquire() and release() are sequentially consistent −Release() is performed after all previous operations have completed −Operations are performed after previous acquire() have been performed Acquire() and release() pair between conflicting accesses −SC and RC produce the same results.

10 Consistency protocol (RC)  ERC Write information is delivered at the release point P0P1 Acquire(L) R(a):0 W(a):1 Release(L) Acquire(L) R(a):? Release(L) R(a):1 Write Notice

11 Consistency protocol (RC)  ERC Write information is delivered at the release point P0P1 Acquire(L) R(a):0 W(a):1 Release(L) Acquire(L) R(a):1 Release(L) Acquire(K) Release(K)

12 Consistency protocol (RC)  LRC The delivery is postponed until the acquire Fewer messages than ERC P0P1 Acquire(L) R(a):0 W(a):1 Release(L) Acquire(L) R(a):? Release(L) R(a):1

13 Consistency protocol (RC)  ERC vs. LRC P0P1 Acquire(L) R(a):0 W(a):1 Release(L) P2 R(a):0 P3 R(a):0 Acquire(L) Release(L) R(a):1 ERC

14 Consistency protocol (RC)  ERC vs. LRC P0P1 Acquire(L) R(a):0 W(a):1 Release(L) P2 R(a):0 P3 R(a):0 Acquire(L) Release(L) R(a):1 ERC P0P1 Acquire(L) R(a):0 W(a):1 Release(L) P2 R(a):0 P3 R(a):0 Acquire(L) Release(L) R(a):1 LRC

15 Page X Multiple writer protocol Page X W(x0):a W(x1):b W(x2):c W(x3):? P0P1 abc Acquire(L) Release(L) Acquire(L) ac W(x0):a W(x1):b W(x2):c W(x3):d P0 P1 a Page X bc abcd False sharing

16 Page X Multiple writer protocol Page X W(x0):a W(x1):b W(x2):c W(x3):? P0P1 abcd W(x0):a W(x1):b W(x2):c W(x3):d P0 P1 a Page X bc abcd False sharing Acquire(L) Release(L) Acquire(L) Release(L) W(x3):d ac

17 Page X Twin and Diff Page X W(x0):a W(x1):b W(x2):c W(x3):? P0P1 abcd Acquire(L) Release(L) Acquire(L) Release(L) W(x3):d Twin X Diff ac diff ac

18 Page X Twin and Diff Page X W(x0):a W(x1):b W(x2):c W(x3):? P0P1 abcd Acquire(L) Release(L) Acquire(L) Release(L) W(x3):d Twin X Diff ac diff ac Twin X b ac Diff b diff ac interval

19 Implementation

20 Etc.  Lock & barrier Statically assigned manager  Garbage collection reclaim the space used by write notice records, interval records, and diffs Triggered when the free space drops below a threshold

21 Evaluation  Experimental Environment 8 DECstation-5000/240 connected to a 100-Mbps ATM LAN and a 10-Mbps Ethernet  Applications Water – molecular dynamics simulation Jacobi – Successive Over-Relaxation TSP – branch & bound algorithm to solve the traveling salesman problem Quicksort – using bubblesort to sort subarray of less than 1K element ILINK – genetic linkage analysis

22 Evaluation Speedup Execution statistics

23 Evaluation Execution time breakdown

24 Evaluation Unix overhead breakdownTreadMarks overhead breakdown

25 Evaluation Execution time breakdown for Water

26 Evaluation ERC vs. LRC Speedup Message rate

27 Evaluation ERC vs. LRC Data rate Diff creation rate

28 Implementation... pages time stamp 000... pages time stamp 000 Acq(L) P0 sideP1 side P P

29 Implementation... pages time stamp 100... pages time stamp 000 Acq(L) P0 sideP1 side W(a) Rel(L) a W(b) b twin P0 W.N P1 W.N P P

30 Implementation... pages time stamp 200... pages time stamp 000 Acq(L) P0 sideP1 side W(a) Rel(L) a Acq(L) W(b) b twin P0 W.N P1 W.N

31 Implementation... pages time stamp 200... pages time stamp 000 Acq(L) P0 sideP1 side W(a) Rel(L) a Acq(L) W(b) b twin P0 W.N P0 100 P1 W.N

32 Implementation... pages time stamp 200... pages time stamp 110 Acq(L) P0 sideP1 side W(a) Rel(L) a Acq(L) W(c) W(b) b twin P0 W.N P0 W.N P1 b diff a P a W.N P0P1 100

33 Implementation... pages time stamp 200 Acq(L) P0 sideP1 side W(a) Rel(L) a Acq(L) W(c) Rel(L) W(b) P0 W.N a diff P... pages time stamp 110 b P0 W.N P1 b diff a a W.N P0P1 100 c W.N P twin b a

34 Implementation... pages time stamp 200 Acq(L) P0 sideP1 side W(a) Rel(L) a Acq(L) W(c) Rel(L) W(b) P0 W.N a diff P... pages time stamp 120 b P0 W.N P1 b diff a a W.N P0P1 100 c W.N twin b a

35 Implementation P1 sideP2 side P0P1 100... pages time stamp 000 P Acq(L)... pages time stamp 120 b P0 W.N P1 b diff a a W.N c P1 W.N twin b a 000110

36 Implementation P1 sideP2 side P0P1 100... pages time stamp 111 P Acq(L)... pages time stamp 120 b P0 W.N P1 b diff a a W.N c P1 W.N twin b a 000110 P0 W.N P1 W.N P1 W.N

37 Thank you ! Any questions?


Download ppt "Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix."

Similar presentations


Ads by Google