TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems Present By: Blair Fort Oct. 28, 2004
Overview Introduction and Motivation Implementation Experiments and Results Conclusions My two cents
Introduction Threadmarks is a Distributed Shared Memory system Unix workstations over an ATM or Ethernet network
Cluster Configuration
Distributed Shared Memory
Motivation No widely available DSM system Eliminate problems of other system Bad portability Bad performance False sharing
Goals Ease of Use Portability Good Performance Also show that it works for real programs
Overview Introduction and Motivation Implementation Experiments and Results Conclusions My two cents
Ease of Use Looks a lot like pthreads Implicit message passing Implicit process creation
Portability Only standard Unix System Calls Message Passing Memory Management
Performance False sharing Excessive message passing
Conventional DSM Implementation
Sequential vs Release Consistency Every Write is broadcasted More Message Passing Writes are broadcasted only synchronization points More Memory overhead
Read-Write False Sharing w(x) r(y) r(x) w(x)
Read-Write False Sharing w(x) r(y) r(x) synch
Write-Write False Sharing w(x) w(y) r(x) synch w(x)
Multiple-Writer False Sharing w(x) w(y) r(x) synch w(x)
Eager vs. Lazy RC Sends Messages at release of lock or at barriers Broadcasts Messages to all nodes Sends Messages when locks are acquired Message goes only to the required node
Eager vs. Lazy RC
Memory Consistency Done by creating diffs Eager RC creates diffs at barriers Lazy RC creates diffs at the first use of a page
Twin Creation
Diff Organization
Vector Timestamps w(x) rel acq w(y) rel p1 p2 p3 acq r(x) r(y)
Diff chain in Proc 4
Garbage Collection Used to merge all diffs – recover memory Occurs only at barriers All nodes that have a pages must have all diffs of that page.
Overview Introduction and Motivation Implementation Experiments and Results Conclusions My two cents
Testing Platform 8 DECstation-5000/240’s running Ultrix V4.3 Network: ATM 100Mbps Ethernet 10Mbps
Testing Programs Modified Water from Splash Jacobi TSP QuickSort ILINK
Unix Overhead
ThreadMarks Overhead
Network Comparison - Water
Lazy vs Eager RC
Message Rate
Data Rate
Diff Creation Rate
Overview Introduction and Motivation Implementation Experiments and Results Conclusions My two cents
Conclusions Automated Distributed Shared Memory system works for real programs! LRC improves performance over ERC for most cases
Overview Introduction and Motivation Implementation Experiments and Results Conclusions My two cents
My Thoughts Good design – promotes re-use Would like to see a comparison over hand- coding the message passing Why not a partial merging of diffs?
Comments/Questions