Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.

Similar presentations


Presentation on theme: "Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon."— Presentation transcript:

1 Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

2 INTRODUCTION Distributed shared memory is a software abstraction allowing a set of workstations connected by a LAN to share a single paged virtual address space

3 Why bother with DSM? Key idea is to build fast parallel computers that are –Cheaper than shared memory multiprocessor architectures –As convenient to use

4 CPU Shared memory Conventional parallel architecture CACHE CPU

5 Today’s architecture Clusters of workstations are much more cost effective –No need to develop complex bus and cache structures –Can use off-the-shelf networking hardware Gigabit Ethernet Myrinet (1.5 Gb/s) –Can quickly integrate newest microprocessors

6 Limitations of cluster approach Communication within a cluster of workstation is through message passing –Much harder to program than concurrent access to a shared memory Many big programs were written for shared memory architectures –Converting them to a message passing architecture is a nightmare

7 Distributed shared memory DSM = one shared global address space main memories

8 Distributed shared memory DSM makes a cluster of workstations look like a shared memory parallel computer –Easier to write new programs –Easier to port existing programs Key problem is that DSM only provides the illusion of having a shared memory architecture –Data must still move back and forth among the workstations

9 Basic approaches Hardware implementations: –Use extensions of traditional hardware caching architecture Operating system/library implementations: –Use virtual memory mechanisms Compiler implementations –Compiler handles all shared accesses

10 Design Issues (I) 1.Structure and granularity –Big units are more efficient Virtual memory pages –Can have false sharing whenever page contains different variables that are accessed at the same time by different processors

11 False Sharing accesses x accesses y x y page containing x and y will move back and forth between main memories of workstations

12 Design Issues (II) 1.Structure and granularity (cont'd) –Shared objects can also be Objects from a distributed object- oriented system Data types from an extant language

13 Design Issues (III) 2. Coherence semantics –Strict consistency is not possible –Various authors have proposed weaker consistency models Cheaper to implement Harder to use in a correct fashion

14 Design Issues (IV) 3.Scalability –Possibly very high but limited by Central bottlenecks Global knowledge operation and storage

15 Design Issues (V) 4.Heterogeneity –Possible but complex to implement

16 Portability Issues Portability of programs –Some DSMs allow programs written for a multiprocessor architecture to run on a cluster of workstations without any modifications ( dusty decks ) –More efficient DSMs require more changes Portability of DSM –Some DSMs require specific OS features Not in paper

17 Implementation Issues (I) 1. Data Location and Access: Keep data a single centralized location Let data migrate (better) but must have way to locate them Centralized server (bottleneck) Have a "home" node associated with each piece of data Will keep track of its location

18 Implementation Issues (II) 1.Data Location and Access (cont'd): Can either Maintain a single copy of each piece of data Replicate it on demand Must either Propagate updates to all replicas Use an invalidation protocol

19 Invalidation protocol Before update: At update time X = 0 X = 5 X = 0 INVALID

20 Main advantage Locality of updates: –A page that is being modified has a high likelihood of being modified again Invalidation mechanism minimizes consistency overhead –One single invalidation replaces many updates

21 A realization: Munin Developed at Rice University Based on software objects (variables) Used the processor virtual memory to detect access to the shared objects Included several techniques for reducing consistency-related communication Only ran on top of the V kernel

22 Munin main strengths Excellent performance Portability of programs –Allowed programs written for a multiprocessor architecture to run on a cluster of workstations with a minimum number of changes ( dusty decks )

23 Munin main weakness Very poor portability of Munin itself –Depended of some features of the V kernel Not maintained since the late 80's

24 Consistency model Munin uses software release consistency –Only requires the memory to be consistent at specific synchronization points

25 SW release consistency (I) Well-written parallel programs use locks to achieve mutual exclusion when they access shared variables –P(&mutex) and V(&mutex) –lock(&csect) and unlock(&csect) –acquire( ) and release( ) Unprotected accesses can produce unpredictable results

26 SW release consistency (II) SW release consistency will only guarantee correctness of operations performed within a request/release pair No need to export the new values of shared variables until the release Must guarantee that workstation has received the most recent values of all shared variables when it completes a request

27 SW release consistency (III) shared int x; acquire( ); x = 1; release ( ); // export x= 1 shared int x; acquire( ); // wait for new value of x x++; release ( ); // export x= 2

28 SW release consistency (IV) Must still decide how to release updated values – Munin uses eager release : New values of shared variables were propagated at release time

29 SW release consistency (V) Eager release Each release forwards the update to the two other processors.

30 Multiple write protocol Designed to fight false sharing Uses a copy-on-write mechanism Whenever a process is granted access to write- shared data, the page containing these data is marked copy-on-write First attempt to modify the contents of the page will result in the creation of a copy of the page modified (the twin ).

31 Creating a twin Not in paper

32 x = 1 y = 2 x = 1 y = 2 First write access twin x = 3 y = 2 Before After Compare with twin New value of x is 3 Example Not in paper

33 Other DSM Implementations (I) Software release consistency with lazy release (Treadmarks) –Faster and designed to be portable Sequentially-Consistent Software DSM (IVY): –Sends messages to other copies at each write –Much slower

34 Other DSM Implementations (II) Entry consistency (Midway): –Requires each variable to be associated to a synchronization object (typically a lock) –Acquire/release operations on a given synchronization object only involve the variables associated with that object –Requires less data traffic –Does not handle well dusty decks

35 Other DSM Implementations (III) Structured DSM Systems (Linda): –Offer to the programmer a shared tuple space accessed using specific synchronized methods –Require a very different programming style

36 TODAY'S IMPACT Very low: –According to W. Zwaepoel. truth is that computer clusters are "only suitable for coarse- grained parallel computation" and this is "[a] fortiori true for DSM" –DSM competed with OpenMP model and OPenMP model won


Download ppt "Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon."

Similar presentations


Ads by Google