Presentation is loading. Please wait.

Presentation is loading. Please wait.

TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.

Similar presentations


Presentation on theme: "TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel."— Presentation transcript:

1 TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel

2 Agenda  DSM Overview  TreadMarks Overview  Vector Clocks  Multi-writer Protocol (diffs)  TreadMarks Algorithm  Implementation  Limitations

3 DSM Overview  Global address space virtualization of disparate physical memory  Program using normal thread/locking techniques (no MPI) Proc Mem Proc Mem Proc Mem Proc Mem

4 DSM Overview  Communication overhead incurred to synchronize memory  Maximize parallel computation and limit communication to improve performance Proc Mem Proc Mem Proc Mem Proc Mem

5 TreadMarks Overview  Minimize communications to improve DSM performance Lazy Release Consistency (Vector Clocks) Multiple Writers (Lazy Diff Creation)  Delay communication as long as possible (possibly even avoid)

6 TreadMarks Overview Release Consistency  Release Consistency: Shared memory updates must be visible when the release is visible No need to send updates immediately upon write P1P1 P2P2 w(x)

7 TreadMarks Overview Lazy Release Consistency  Lazy Release Consistency: Shared memory updates are not made visible until the time of acquire No update propagated if update never acquired P1P1 P2P2 w(x)

8 Vector Clocks  Global clock mechanism for identifying causal ordering of events in distributed systems Mattern (1989) and Fidge (1991) P1P1 P2P2 P3P3

9 Vector Clocks  Each process maintains a vector of counters One for each process in the system P1P1 P2P2 P3P3 000000 000000 000000

10 Vector Clocks  Each process maintains a vector of counters One for each process in the system P1P1 P2P2 P3P3 000000 000000 000000

11 Vector Clocks  Increments own counter upon Local Event P1P1 P2P2 P3P3 000000 000000 000000 100100

12 Vector Clocks  Increments own counter upon Local Event P1P1 P2P2 P3P3 000000 000000 000000 100100 001001

13 Vector Clocks  Increments own counter and updates all other counters upon Receiving Message P1P1 P2P2 P3P3 000000 000000 000000 100100 001001 202202 002002

14 Vector Clocks  Increments own counter and updates all other counters upon Receiving Message P1P1 P2P2 P3P3 000000 000000 000000 100100 001001 202202 002002 302302 312312

15 Diff Creation  Retains copy of page upon first writing P2P2 P1P1

16 Diff Creation  Retains copy of page upon first writing P2P2 P1P1

17 Diff Creation  Create diff by comparing modified page against original (RLC) P2P2 P1P1

18 Diff Creation  Send diff to other processes P2P2 P1P1

19 Lazy Diff Creation  Diffs created only when a page is invalidated  Or the modifications are requested explicitly access miss on invalidated page P2P2 P1P1

20 TreadMarks Algorithm  P 1 Cannot proceed past acquire until: All modifications have been received from processes whose vector timestamps are smaller P 1 ’s P1P1 P3P3 000000 000000 100100 001001

21 TreadMarks Algorithm  On acquire: P 1 Sends Vector Timestamp to releaser P1P1 P3P3 000000 000000 100100 001001 100100

22 TreadMarks Algorithm  On acquire: P 1 Sends Vector Timestamp to releaser P 2 Attaches invalidations for all updated counters P1P1 P3P3 000000 000000 100100 001001 100100 101101 invalidate

23 TreadMarks Algorithm  On acquire: P 1 Sends Vector Timestamp to releaser P 2 Attaches invalidations for all updated counters P 2 Sends updated Vector Timestamp with invalidations P1P1 P3P3 000000 000000 100100 001001 101101 invalidate 101101

24 TreadMarks Algorithm  Diffs generated when: Receiving invalidation (i.e. P 1 had made prior updates to this page also) Page is accessed (miss) P1P1 P3P3 000000 000000 100100 001001 101101 invalidate diff w(x)

25 TreadMarks Implementation Data Structures Page array page 1 2 proc_id Write notice record Diff pool Proc array 1 Interval* record *VC counter

26 TreadMarks Implementation Locks  Each lock is statically assigned a manager (RR) Keeps track of processors  Lock acquires are sent to manager (forwarded to last processor to obtain lock)  Upon release, sends (for each interval): Processor ID and Vector Timestamp Any invalidations that are necessary

27 TreadMarks Implementation Barriers  Centralized barrier Manager  Upon arrival at barrier: Notifies Manager of intervals that the manager does not already have Incorporated when Manager arrives at barrier  When all clients have arrived: Manager notifies all clients of intervals they do not already have  Expensive

28 Limitations  Achieved nearly linear speedup for TSP, Jacobi, Quicksort, ILINK algorithms  Water: Each molecule in simulation is protected by lock and frequently accessed Barriers used in synchronization Speedup is limited by low computation to communication ratio of algorithm (many fine-grained messages)

29 Limitations  TSP: Eager Release Consistency performs better than Lazy Release Consistency (Fig. 9) Updates occur on invalidation and access misses (writes/synchronization points) TSP algorithm reads stale ‘current minimum’ value without synchronization

30 Limitations  Depends on events (write/synchronization) to trigger consistency operations  More opportunities to read stale data (TSP)  Reduced redundancy increases risk of data loss

31 Summary  Improves performance by improving computation to communication ratio  Delay consistency updates until page access is acquired  Weaker consistency implies greater likelihood of reading stale data and data loss  Procrastination = Performance


Download ppt "TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel."

Similar presentations


Ads by Google