Presentation on theme: "Distributed Shared Memory"— Presentation transcript:
1 Distributed Shared Memory Providing a shared-memory abstraction in a distributed memory system
2 Shared Memory Programming Model In a shared memory system cooperating processes/threads communicate by reading and writing shared memory.OS provides system calls to allow separate processes to share memory and communicate.Threads communicate through global memory in the process’s address space.
3 Message Passing Programming Model In distributed systems (no shared memory)Applications may run on several processors & communication is based on message passingPackages like MPI support message-based communicationRPC and client-server models provide a high-level interface that makes message passing resemble procedure calls.
4 Distributed Shared Memory Introduction Alternative: implement a software interface to let users access remote memory just like any other virtual memory reference.i.e., remote memory references are transparent to the application, just like page faults are transparent in virtual memory.In theory, it could be possible to write the statement “X = Y + Z;” and have it execute correctly even though X, Y, and Z are stored on separate computers.
5 Distributed Shared Memory Introduction Distributed Shared Memory (DSM) systems aim to provide this interfaceit is still necessary to use message passing to transfer data from one memory to another, but DSM system handles it, not the application.Software DSM (as opposed to hardware implementations) is useful for clusters, grids and other loosely coupled systemsOriginally proposed by Li (the Ivy System) in 1986.
6 How Does it Work?If an application generates an address that maps to local memory the reference is satisfied locally (normal virtual memory)If the address refers to a remote machine location, the data is automatically moved to the local machine, where it can then be accessed normally.DSM can be implemented at the operating system level or with a library of functions that run at the user level (middleware).
7 Advantages – Big Picture DSM can be used to support data sharing between processes running on separate computers in a distributed system.DSM can (possibly) improve performance by speeding up data access.
8 Advantages Hide message passing from the application Move data in large blocks that may be able to take advantage of localityPort programs from shared memory multiprocessors with little or no change.Clusters of workstations are cheaper & more scalable than shared memory multiprocessorsMay do away with the need for disk-based virtual memory. High-speed network transfers may be cheaper than disk reads.It may be possible to store everything in the combined memories of all the processors.
9 Implementation Issues Locating remote dataProtocol overhead, transmission delaysConcurrent access at multiple nodes may cause data consistency problems (~ caches in DFS)Structure : should the DSM resemble ordinary VM (linear array) or should it be a collection of shared objects?Granularity: how much data is transferred in a single operation?If a multiple of the native page-size is used, the paging hardware can interact with the DSM system.False sharing is a problem, especially with larger block sizes.
10 Two Approaches to DSM Unstructured Based on the native paging system.Combined memories of all processors are treated as if they are one shared memory.A process’s virtual memory pages could be stored on any machine.Page faults can be satisfied locally or remotely.Page sharing is supported, as it is in a shared memory processor.
11 Two Approaches to DSM Structured Unstructured DSM shares on a page-by-page basis.In structured DSM, programmers designate certain data structures/objects as being “shared”; other data is private and managed locally – will not usually be stored at other sites.
12 Replication in DSMReplication of shared data promotes increased parallelism, fewer page faults, reduced network traffic, and in general is more efficient than non-replicated implementations.Main problem: preserving consistency when multiple copies are present.How/when do changes made at one node become visible on another node?
13 Consistency Semantics Hardware cache coherence promises strict consistency (UNIX semantics): a read returns the most recent write.Multiprocessors maintain cache coherence by broadcasting writes to all processors, which can then either update or invalidate their local caches.Since software DSM can’t efficiently implement the atomic broadcasts needed to preserve strict consistency, other consistency models are needed.
14 Sequential Consistency - Review Formal Definition: The result of any execution is the same as if the operations of all processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program.In other words, the instructions of all processes are interleaved in some sequential order.
15 Sequential Consistency For independent processes (no data sharing), this presents no problem. For critical sections, there is a possibility of race conditions.Users who wish to enforce a certain order of execution can use synchronization mechanisms, just as they would in a shared memory processor.Unstructured DSM usually enforces sequential consistency.
16 Consistency in Structured DSM Structured, or object-based DSM can use more efficient consistency semantics because it is easier to specify what is shared.Users only share designated objects or variables. Shared pages may have shared and unshared data on them.Users can identify points in the program where the data must be consistent.
17 Consistency in Structured DSM If shared-data accesses occur only inside critical sections, the DSM only needs to ensure that variables are consistent when a process enters a critical section (or when it exits)We also assume that processes have a way to “lock” a critical section – either by a centralized or distributed mutual exclusion algorithm, for example.
18 Two Consistency Models Release consistency: When a process exits a critical section, new values of the variables are propagated to all sites. No updates need to occur during the critical section, because no other process can see the data then.Entry consistency: When a process enters a critical section, it updates the values of the shared variables.
19 Summary Consistency is not an issue if pages/objects aren’t replicated Also not an issue if only read-only pages are replicated.If read-write pages are replicated, consistency must be addressed.The issue: when/how to propagate updates to all sites that have replicas of the data.
20 SummarySequential consistency sends each update to all sites in program order, although not necessarily immediately.Entry consistency propagates updates to a site Si when Si enters a critical section (the update must be requested).Release consistency propagates updates to all other sites when a process leaves its critical section.In each case “all sites” means “all sites that have a copy of the updated data”.
21 EvaluationA good idea, offered a plausible alternative to message passing models on clusters.Much research, not widely adopted.Recently, revived interest due to interest in grid computing.
22 Locating Pages (As proposed by developers of the Ivy system) Three approachesCentralized managerFixed distributed managerDynamic distributed managerThis solution was designed for page-based memory systems that use the read-replicate approach. Structured, or object-based systems can use a similar technique.
23 Centralized ManagerA central manager has information about the owner of each page.When a process faults, the local “mapping manager” (software that runs on each machine) contacts the central manager.The manager is updated if page ownership changes. (The owner is usually the last process to have the page in write-mode).
24 Fixed Distributed Manager Each processor knows the owners of a subset of the pages; together, all processors know all ownersThe function that determines who has ownership data about a page also determines how a faulting processor locates the owner.Compare to Chord algorithmHere, as in centralized approach, concurrent requests are serialized at the manager. (Access is FCFS)
25 Dynamic Distributed Manager Each processor tracks the ownership of pages currently in its page table.Page tables are augmented with a probable owner (probowner) field.Field contents may or may not be up-to-date (in other words, treat as hints)Page faults are directed to the processor in probowner field, which either satisfies the fault, or forwards request elsewhere.
26 The Ivy System Page based Enforces multiple readers, single writer – supports “strict” consistency (within the limits mentioned earlier). All read-only pages are invalidated before a write operation is allowed to continue.On any processor, a page in the system-wide DSM can be either read-only, write, or nil (invalid: out of date or not present).
27 Complete systems Add-ons to existing systems Real World SystemsComplete systemsAdd-ons to existing systems
28 TreadMarks http://www.cs.rice.edu/~willy/TreadMarks/overview.html Supports parallel computing on networks of workstations and clusters.Main feature: “provides a global shared address space across the different machines on a cluster.”Contrast to packages such as PVM or MPI that provide a message passing interface between machines.The shared memory interface lets programmers focus on algorithms instead of communication.Research project appears to have terminated.
29 Other SourcesDSM 2006:The Sixth Annual Conference on Distributed Shared Memory.Distributed Shared Memory in a Grid Environment, J.P. Ryan, B. A. Coghland, Parallel Computing: Current and Future Issues in Parallel Computing, 2006First Annual Conference on Data Sharing, Consistency and DSM