Distributed Shared Memory

Distributed Shared Memory

Two basic methods for IPC
Original sharing or shared-data approach Copy sharing or message-passing approach

Message passing basic primitives are:
Send (recipient, data) Receive (data) Shared Memory primitives are: data = Read (Address) Write (data, address)

Multiple processor systems
Multiprocessors ( tightly coupled ) : multiple CPUs share common main memory. Shared-data approach is used. Multi-computers ( loosely coupled ): Each CPU has its own memory and data can be shared by message-passing.

A tightly coupled multiprocessor system
CPU System-wide Shared memory Interconnection hardware Local memory Communication network A tightly coupled multiprocessor system Tightly couples system: If one proc writes value 100 to loc x all can see Loosely couples system: each proc has local memory A loosely coupled multiprocessor system

H/w point of view Designing a system in which many processors use the same memory simultaneously is difficult. Large multi-computers are easier to build. From h/w point of view multi-computers are preferable to multiprocessor systems

S/W point of view In multiprocessor systems for communication, one process just writes data to the memory, to be read by others Various techniques, such as use of critical section, semaphores are available to manage access to memory and IPC.

In multi-computer IPC is achieved by message passing which involves tricky programming and issues such as lost messages, buffering and blocking. Though some issues are reduced with RPC, but it cannot easily transfer complex data structures containing pointers.

Thus multi-computer are easier to build but harder to program, while multiprocessor systems are complex to build but easier to program. Distributed shared memory system is an attempt to build a system that is easy to build and easy to program.

A DSM system can be viewed as a set of nodes which are interconnected having a view of single logical shared memory, even though the memories are, in reality, distributed.

A reference to a local page is done in hardware
A reference to a local page is done in hardware. An attempt to refer a page on a different machine causes a page fault. The OS then sends message to remote machine, which finds the page and then returns it to the requesting machine. This design is similar to traditional virtual memory systems, the difference is instead of getting the page from disk, the OS gets it from another processor over the network.

address Data= read (address) Write (address, data)

General architecture of DSM systems
The DSM abstraction presents a large shared-memory space to the processors of all the nodes, though the shared memory exists only virtually. A software memory-mapping manager maps local memory onto shared virtual memory The shared memory space is partitioned into blocks.

When a process accesses some data from a memory block of the shared memory space, the local memory-mapping manger takes charge of the request. If the requested block is not in the local memory control is transferred to the OS. The OS sends the message to the desired node and block is migrated.

The blocks keep migrating from one node to another but no communication is visible to the user processes. To the user processes, the system looks like a tightly coupled shared-memory multiprocessor system in which multiple processes freely read and write the shared memory at will. Data caching is used by the nodes to reduce network traffic.

Design and implementation issues of DSM
Granularity Structure of the shared-memory space Memory coherence and access synchronization Data location and access Replacement strategy Thrashing Heterogeneity

Granularity Granularity refers to the block size of a DSM system. Block size may be few words, a page or few pages. Proper block size is imp since it is usually the measure of the granularity of parallelism explored and amount of network traffic generated.

Granularity Factors influencing block size selection:
Paging overhead: small for large block size due to locality of reference. Directory size Thrashing False sharing: may lead to thrashing problem.

Thrashing: occurs when data items in the same data block are updated by multiple nodes at the same time, causing large number of data block transfers among the nodes without much progress in the execution of the application.

False sharing Process P1 accesses data in this area P1
A Data block

Structure of shared-memory space
Structure defines the abstract view of the shared-memory space to be presented to application programmers of a DSM system. DSM system may appear to its programmers as a storage for words while programmers of another DSM system may view it as a storage for data objects. Commonly used approaches are: No structuring, Structuring by data type, Structuring as a database

No structuring Most DSM system do not structure their shared-memory space. It is simply a linear array of words. Advantage is that it is convenient to choose any suitable page size as a unit of sharing and a fixed grain size may be used. Simple and easy to design such a DSM system.

Structuring by data type
The shared-memory space is structured either as a collection of objects or variables. In object based, the memory is only accessed through class routines and therefore, OO semantics can be used when implementing this system

Process Object space Object

Structuring as database
Shared-memory space is ordered as an associative memory called tuple space. Process selects tuples by specifying the number of their fields and their values or types. Special access functions are required to interact with the shared-memory space.

Tuple Space A tuple consists of one or more fields, each of which is a value of some type supported by the base language. For example, (“abc”,2,5) (“Person”, “Doe”, “John”, 23, 65) (“pi”, )

Operations on Tuples Out, puts a tuple into the tuple space.
E.g. out (“Person”, “Doe”, “John”, 23, 65) In, retrieves tuple from the tuple space. E.g. in (“Person”, ? Last_name, ? First_name, 23, ? Weight)

Consistency models When data are replicated, we need to coordinate the read/write actions to assure the desired degree of consistency Data is read from local copies of the data but writes/updates to data must be propagated to other copies of the data Memory consistency models determine when data updates are propagated and what level of inconsistency is acceptable

Consistency models It refers to the degree of consistency that has to be maintained for the shared-memory data for the memory to work correctly for a certain set of applications. It is defined as a set of rules that applications must obey to get the desired consistency.

Strict Consistency Model
This is the strongest form of memory coherence. A shared-memory system is said to be strictly consistent if the value returned by a read operation on a memory address is always the same as the value written by the most recent write operation to that address irrespective of the location of the processes performing the R/W operation. i.e. all writes must be instantaneously available to all the processes.

Uniprocessors traditionally observe strict consistency
e.g. a=1; a=5; print (a); prints 5 In DSM matter is complicated. Strict consistency model is practically impossible to implement in DSM.

Strict Consistency Example
W(x)5: write value 5 to x R(x)5: read x and return 5 Strictly Consistent Not Strictly Consistent

Strict Consistency Example
W(x)3 P2 W(x)5 P3 R(x)5 P3 R(x)3 Strict: P4 R(x)5 P4 R(x)3

Strict Consistency Violation of Strict Consistency P1 P2 P3 P1 P2 P3 W(x)2 W(x)2 2R1(x) 0R1(x) 2R1(x) 2R3(x) 2R3(x) 2R1(x)

P1 P2 P3 P4 W1(x)2 2R1(x) W2(x)5 5R4(x) 5R1(x)

Sequential Consistency
All processes see the same order of all memory access operations on the shared memory. Weaker than strict consistency model because it does not guarantee that a read operation on a particular address always returns the same value as written by the most recent write operation to that address.

Sequential Consistency Example
Both are valid, but all processes should same sequence Not valid

Sequential Consistency Example
W(x)3 P2 W(x)5 P3 R(x)5 R(x)3 P3 R(x)5 R(x)3 Sequential: P4 R(x)5 R(x)3 P4 R(x)3 R(x)5

Sequential Consistency
P1 P2 P3 P4 W1(x)2 W2(x)5 5R1(x) 5R4(x) 2R1(x) 2R4(x)

Casual Consistency All processes see only those memory reference operations in the same (correct) order that are potentially casually related. i.e. Writes that are potentially causally related must be seen by all processes in the same order. Concurrent writes may be seen in a different order on different machines. The system must keep track of which memory reference operation is dependent on which other memory ref operation.

Casual Consistency Example
Casually Consistent Not Casually Consistent

W(x)3 P2 W(x)5 P3 R(x)3 R(x)5 Causal: P4 R(x)5 R(x)3

W(x)3 W(x)8 P2 R(x)3 W(x)5 P3 R(x)3 R(x)5 P3 R(x)5 R(x)3 Causal: P4 R(x)3 R(x)5 P4 R(x)5 R(x)3

Causal Consistency Not Causal Consistency P1 P2 P3 P4 P1 P2 P3 P4 W2(x)a W2(x)a aR3(x) aR4(x) aR1(x) aR3(x) W2(x)c W3(x)b W3(x)b bR4(x) cR1(x) aR1(x) bR4(x) bR1(x) cR4(x) bR1(x) aR4(x)

PipelinedRAM Consistency Model
PRAM consistency model only ensures that all write operations performed by a single process are seen by all other processes in the order in which they were performed as if all the write operations performed by a single process are in a pipeline.

PRAM Consistency Example
Invalid PRAM consistency Valid in PRAM (invalid in casual consistency)

PRAM Consistency Example
W(x)3 W(x)8 P2 R(x)3 W(x)5 P3 R(x)3 R(x)5 P3 R(x)8 R(x)3 PRAM: P4 R(x)5 R(x)3 P4 R(x)5 R(x)3

PRAM Consistency P1 P2 P3 W2(x)b aR1(x) 0R1(x) W2(x)a W3(x)1 W3(x)0 P4 1R1(x) bR1(x) aR4(x) 0R4(x) bR4(x) 1R4(z) W2(y)a W3(z)1 aR1(y) 1R4(x) 1R1(z) aR4(y)

Processor Consistency
PRAM Consistency + all write to the same memory location must be visible in the same order.

W2(x)b W2(x)a W3(y)1 W3(y)0 P4 W2(y)a W3(z)1 aR1(x) 0R1(y) 1R1(y) bR1(x) aR1(y) 1R1(z) aR4(x) 0R4(y) 1R4(y) bR4(x) aR4(y) 1R4(z)

W2(x)b W2(x)a W3(y)1 W3(y)0 P4 W2(y)a W3(z)1 aR1(x) 0R1(y) 1R1(y) bR1(x) aR1(y) 1R1(z) aR4(x) 0R4(y) 1R4(y) bR4(x) aR4(y) 1R4(z) aR4(y) PRAM Consistency P1 P2 P3 W2(x)b aR1(x) 0R1(x) W2(x)a W3(x)1 W3(x)0 P4 1R1(x) bR1(x) aR4(x) 0R4(x) bR4(x) 1R4(z) W2(y)a W3(z)1 aR1(y) 1R4(x) 1R1(z)

Weak Consistency Model
It takes the advantage of the char. common to many applications that it is not necessary to show the change in memory done by every write operation to other processes. ( e.x. write in critical section) and isolated access to shared variables are rare. Better performance can be achieved if consistency is enforced on a group of memory ref operations than on individual memory reference operation.

How the system can know that it is the time to show the changes performed by a process to other processes? Programmers are supposed to tell the system by using a special variable called synchronization variable. When a synchronization variable is accessed by a process, entire shared memory is synchronized by making all the changes to the memory made by all processes visible to all other processes. Provides better performance at the cost of putting extra burden on the programmers.

Weak Consistency Example
Not Weak Consistency

Weak Consistency Not Weak Consistency P1 P2 P3 P1 P2 P3 W2(x)a W2(x)a W2(x)b W2(y)c W2(y)c W2(x)b S1 S1 S2 S2 S3 S3 cR1(y) bR3(x) bR1(x) aR3(x) cR3(y) cR1(y) cR3(y) bR1(x)

Release Consistency Model
In Weak consistency model, memory synchronization basically involves: 1. All changes made to the memory by the process are propagated to other nodes. 2. All changes made to the memory by other processes are propagated from other nodes to the process's node. 1. needed when process exits from the CS 2. needed when the process enters the CS

Since a single synchronization variable is used in weak consistency model, system cannot know if a process is entering a CS or exiting from CS. So both the operations 1 & 2 are performed when a process accesses a synchronization variable. Release consistency model uses 2 sync variables acquire and release to clearly tell whether a process is entering in the CS or exiting from a CS.

Release Consistency Examples
Not Release Consistency

Release Consistency Example
Acq1(L) W1(x)a W1(x)b Rel1(L) Acq2(L) bR2(x) Rel2(L) A variation on release consistency model is “lazy release consistency”. In this approach, the modifications are sent to other nodes only on demand.

Discussion of consistency models
It is difficult to grade the consistency models based on performance, because one application may have good performance for one model, but other application may have good performance for some other model. Choice depends on several factors such as how much it is easy to implement, use, how much concurrency it allows.

Strict consistency is never used.
Sequential consistency is the most commonly used model. (doesn’t put any burden on the programmer’s, however it suffers from low concurrency) Casual, PRAM, processor, weak and release consistency models are the main choices in weak category (put burden on programmer’s) Weak and release consistency provide better concurrency (DSM systems were designed to reduce burden of programmers)

Implementing Sequential Consistency Models
Protocols for implementing the sequential consistency model in a DSM system depends to a great extent on whether a DSM system allows replication and/or migration. The designer of a DSM system may choose one of the following strategies: Nonreplicated, nonmigrating blocks (NRNMBs) Nonreplicated, migrating blocks (NRMBs) Replicated, migrating blocks (RMBs) Replicated, nonmigrating blocks (RNMBs)

NRNMBs Each block has a single copy whose location is always fixed.

Drawbacks: Serializing data creates bottlenecks, due to non-replication, parallelism not possible.
Data Locating: since there is a single copy of each block in the entire system and location of data never changes a simple mapping function is used to map a block to a node.

NRMBs Single copy of each block in the entire system.
Each access to a block causes the block to migrate from its current node to the node where it is accessed. Owner of the block changes dynamically.

Advantage: less cost of migration if the applications exhibits high locality of reference.
Drawbacks: prone to thrashing problems, parallelism not possible

Data locating in NRMB strategy
In NRMB strategy the location of a block keeps changing dynamically so, one of the following methods may be used to locate a block: Broadcasting Centralized-Server algorithm Fixed Distributed-Server algorithm Dynamic Distributed-Server algorithm.

Broadcasting NRMB All nodes must process the broadcast request N2 N4
Owned blocks table T1 B2 B5 T3 B6 B3 T2 B4 B7 T4 B8 2 1 All nodes must process the broadcast request

Broadcasting NRNB Block Address (Changes dynamically)
Contains an entry for each block for which this node is the current owner Node 1 Node i Node M Owned block table Node Boundary

Centralized Manager N1 4 1 S N2 3 2 Centralized Server Data Block Owner Node Server will make changes in loc info, now N1 will be the owner

Centralized server NRNB
Block Address (Changes dynamically) Contains an entry for each block in the shared address space Node i Block table Node Boundary (remains fixed) Owner Node Centralized server Node 1 Node M

Fixed Distributed Server
Data Block Owner Node Using mapping function, fault handler can find out the node whose block manager is managing currently accessed block. N3 N1 The block manager handles the request same like centralized server 1 3 Data Block Owner Node N4 N2 2 There is a block manager on several nodes

Fixed distributed server NRNB
Node i Node Boundary Block Address (Changes dynamically) Block table (remains fixed) Owner Node Block manager Node 1 Node M Contains entries for a fixed subset of all blocks in the shared-memory space

Dynamic Distributed Server
Data Block Probable Owner N1 N4 N2 N3 1 2 3 4 B3 B4 No block manager, each node has a block table that contains ownership info for all blocks in the shared-memory space.

Dynamic distributed server NRNB
Node i Node Boundary Block Address (Changes dynamically) Block table (remains fixed) Probable Node Node 1 Node M Contains an entry for each block in the shared-memory space Probable Node

Replicated, Migrating Blocks
Nonreplication strategies lack in parallelism. Replication tends to increase the cost of write operations because a for a write to a block, all its replicas must be invalidated or updated to maintain consistency. Two basic protocols to ensure sequential consistency: Write-invalidate and Write-update.

Write-Invalidate

All copies of a piece of a data except one are invalidated before a write operation can be performed on it. When a write fault occurs at a node, its fault handler copies the accessed block from one of the block’s current node to its own node, invalidates all other copies of the block by sending an invalidate message containing the block address to the nodes having a copy, changes the access of the local copy of the block to write and returns to the faulting instruction Now it “owns” that block.

Write-update

A write operation is carried out by updating all copies the data on which the write is performed.
When a node writes a local copy of a block, it updates all the copies of block by sending the address of the modified memory location and its new value to the other nodes having the copy. The write operation completes only after all the copies of the block have been successfully updated. After write, all nodes that had a copy of the block before write also have a valid copy after write operation.

Global Sequencing Mechanism

Write-update approach is very expensive in loosely coupled systems as it requires network access on every write operation. In write-invalidate approach data is propagated when data are read, and several updates can take place before communication is necessary. Therefore most DSM systems use write-invalidate protocol. A status-tag may be associated with each block (valid, read only, writable…)

Data location in RMB strategy
In the write-invalidate protocol the issues are: locating the owner of the block and keeping track of the nodes that currently have a valid copy of the block. One of the following methods are used: Broadcasting Centralized-Server algorithm Fixed Distributed-Server algorithm Dynamic Distributed-Server algorithm

Broadcasting RMB All nodes must process the broadcast request B4 N4,N1
Block address Copy-set Block address Copy-set B4 N4,N1 Owned blocks table 2 B1 N3 N1 N2 1 1 1 Block address Copy-set Block address Copy-set N4 N3 B3 B6 All nodes must process the broadcast request

Broadcasting RMB Node i Node Boundary Node 1 Node M Block Address
(Changes dynamically) Contains an entry for each block for which this node is the owner Owned blocks table (changes dynamically) Copy-set

Centralized Server RMB
Block Address (Changes dynamically) Contains an entry for each block in the shared-memory space Node i Block table Node Boundary (remains fixed) Owner Node Node 1 Node M Copy-set

Centralized Server RMB
Read Fault: the centralized server adds the faulting node N to the blocks copy set and returns the owner node info to node N Write Fault: it returns both the copy-set and owner node info to node N and then initializes the copy-set field to contain only node N.

Fixed Distributed server RMB
Node i Node Boundary Node 1 Node M Block Addrs (Changes dynamically) Contains an entry for a fixed subset of all block in the shared-memory space Block table (remains fixed) Owner Node Copy-set Contains an entry for a fixed subset of all blocks in the shared-memory space Contains an entry a fixed subset of all blocks in the shared-memory space Block manager

Dynamic dist server RMB
Node i Node Boundary Node M Node 1 Block table (Changes dynamically) Block Add Contains an entry for a fixed subset of all blocks in the shared-memory space (remains fixed) Probable Owner Copy-set An entry has a value in this field only if this node is the true owner of the corresponding block

Replicated, Nonmigrating Blocks
In RNMB strategy a block may be replicated at multiple nodes, but the location of each replica is fixed. A protocol similar to write-update is used to keep all the replicas consistent. Sequential consistency is ensured by using a global sequencer to sequence the write operations of all nodes.

Data Location A block table, having an entry for each block in the shared memory is maintained at each node. Each entry maps to one of its replica locations. A sequence table, also having an entry for each block in the shared-memory space is maintained with the sequencer. Each entry has three fields: block address, replica set, sequence number.

Node i Node Boundary Block Address Block table (remains fixed) Replica Node Node 1 Node M Contains an entry for each block in the shared-memory space Replica Node Seq no. (is incremented by 1 for every new modfcn in the block) Sequence table Centralized Sequencer

Replacement Strategies
In DSM systems that allow shared-memory blocks to be dynamically migrated/replicated. The following issues must be handled: Which block to be replaced? Where should the replaced block be placed?

Replacement Strategies
Based on Usage Based on Space Usage Based Non-usage Based Fixed Space Variable Space LRU FIFO

Some DSM systems differentiate the status of data items and use a priority mechanism. E.g. classification of data blocks can be : Unused, Nil, Read-Only, Read-Owned, Writable

Where to place a replaced block?
Discarding a read-owned or writable block for which there is no replica on any other node may lead to loss of useful data. Two commonly used approaches are using: - Secondary storage - Memory space of other nodes

Thrashing Thrashing is said to occur when the system spends a large amount of time transferring shared data blocks from one node to another, compared to the time spent doing the useful work of executing application processes. Thrashing may occur when a block moves back and forth in quick succession or read-only blocks are repeatedly invalidated soon after they are replicated.

Solutions Providing application-controlled locks
Nailing a block to a node for a minimum amount of time Tailoring the coherence algorithm to the shared-data usage patterns

Heterogeneous DSM Data conversion: Block size selection

Advantages of DSM Simpler abstraction
Better portability of distributed application programs Better performance of some applications Flexible communication environment Ease of process migration

Distributed Shared Memory

Similar presentations

Presentation on theme: "Distributed Shared Memory"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed Shared Memory

Similar presentations

Presentation on theme: "Distributed Shared Memory"— Presentation transcript:

Similar presentations

About project

Feedback