University of Tampere, CS Department Distributed Transaction Management Jyrki Nummenmaa

http://www.cs.uta.fi/ University of Tampere, CS Department Distributed Transaction Management Jyrki Nummenmaa jyrki.nummenmaa@cs.uta.fi

http://www.cs.uta.fi/ University of Tampere, CS Department How distributed txns are born? Distributed database management systems: a client contacts local server, which takes care of transparent distribution – the client does not need to know about the other sites. Alternative: a client contacts all local resource managers directly. Alternative: separate client programs are started on all respective sites. Typically the client executing on the first site acts as a coordinator communicating with the other clients.

http://www.cs.uta.fi/ University of Tampere, CS Department Vector timestamps (continuing from last lecture)

http://www.cs.uta.fi/ University of Tampere, CS Department Assigning vector timestamps Initially, assign [0,...,0] to myVT. The id of the local site is referred to as ”self”. If event e is the receipt of message m, then: For i=1,...,M, assign max(m.VT[i],myVT[i]) to myVT[i]. Add 1 to myVT[self]. Assign myVT to e.VT. If event e is the sending of m, then: Add 1 to myVT[self]. Assign myVT to both e.VT and m.VT.

http://www.cs.uta.fi/ University of Tampere, CS Department Vector timestamps T1T1 T2T2 T3T3 m1m1 m3m3 m2m2 T4T4 m5m5 m4m4 m6m6 m7m7 m9m9 m8m8 [1,0,0,0] [0,1,0,0] [1,0,0,1] [1,0,0,2] [2,0,0,2] [3,0,0,2] [0,0,1,0] [1,0,1,3] [1,1,1,4] [3,1,1,5] [3,1,1,6] [3,1,3,7] [3,1,3,8] [3,3,3,9] [3,1,2,6] [3,1,3,6] [3,2,3,8] [3,3,3,8]

http://www.cs.uta.fi/ University of Tampere, CS Department Vector timestamp order e.VT  V e'.VT, if and only if e.VT[i]  e'.VT[i], 1  i  M. (M = number of sites) e.VT <V e'.VT, if and only if e.VT  V e'.VT, and e.VT  e'.VT. [0,1,2,3]  [0,1,2,3] [0,1,2,2] < [0,1,2,3] The order of [1,1,2,3] and [0,1,2,4] is not defined, they are concurrent.

http://www.cs.uta.fi/ University of Tampere, CS Department Vector timestamps - properties Vector timestamps also guarantee that if e< H e', then e.VT < e'.VT - This follows from the definition of happens-before relation by observing the path of events from e to e’. Vector timestamps also guarantee that if e.VT < e'.VT, then e < H e' (why?).

http://www.cs.uta.fi/ University of Tampere, CS Department Example Lock Manager Implementation

http://www.cs.uta.fi/ University of Tampere, CS Department Example lock manager Software to manage locks (no items). The software is not created for any application, just to demo concepts. (Non-existing) items are identified by numbers. Users may start and stop transactions and take and release locks using an interactive client. From the user’s point of view, the interactive client passes the commands to the lock managers. Number of managers to consult for a shared lock is given in startup and exclusive is computed from there. Lock managers queue lock requests and grant locks when/once they are available.

http://www.cs.uta.fi/ University of Tampere, CS Department Lock manager classes See javadocs. LockManager manages locks. A LMMultiServerThread is created for each client. LockManagerProtocol talks with a client and calls LockManager methods. StartLockManager just starts a LockManager. Lock – a single lock object. Transaction – a single transaction object. Waitsfor – a single lock waiting for another.

http://www.cs.uta.fi/ University of Tampere, CS Department Lock client classes StartLockClient – starts a lock client. LockClientController – the textual user interface, this is separate from the LockClient because for user it uses blocking input. It just reads standard input and sends commands to a LockClient. LockClient – a client talking to a LockManager. The client opens a socket to talk to a lock manager, and reads input from both the Lock Manager and the Client Controller.

http://www.cs.uta.fi/ University of Tampere, CS Department LockClient implementation In addition to the communication data, the LockClient keeps track on its txn id and its lock requests (pending and granted). Output from LockManagers is just printed out with the exception that if it contains a new txn-id, it will be stored. It keeps a track on lock requests from client controller: –When a new lock request comes in, it will send it to appropriate LockManagers. –When an unlock request comes, it will check that there is a lock request and send the unlock to LockManagers. NewTxn requests are only sent to one LockManager.

http://www.cs.uta.fi/ University of Tampere, CS Department More on input/output Each LMMultiServerThread talks with only one client. Each client talks to several servers (LMMultiServerThreads). An interactive client talks and also listens to a client controller, but it does not read standard input as this would block, or if you just check input with.ready() then user input is not echoed on the screen :( A client controller only sends reads standard input and sends the messages to an interactive client.

http://www.cs.uta.fi/ University of Tampere, CS Department LockManager data structures A Vector for currently locked objects. A Vector for locks queued in. A Vector for locks given from the queue and waiting to be notified to the clients. A Vector for transactions.

http://www.cs.uta.fi/ University of Tampere, CS Department LockManager locking The lock queue is checked. If there are xlocks preventing locking, the lock is put into the queue, otherwise the lock is added. When locks are released, the queue is checked and the first possible queueing lock is given. If it is possible to give other locks, they are also given. Those locks are added into the notify queue. As a LMMultiServerThread communicates with the client, it will also regularly call LockManagers checkNotify() method to see if there are locks given from the queue to be notified.

http://www.cs.uta.fi/ University of Tampere, CS Department LockManagerProtocol A LMMultiServerThread uses a LockManagerProtocol to manage communication with a client. The LockManagerProtocol parses the input from the client (and should do some more error processing than it does at present). The LockManagerProtocol calls the LockManager’s services for the parsed commands.

http://www.cs.uta.fi/ University of Tampere, CS Department Example lock management implementation - concurrency A lock manager uses threads to serve incoming connections. As threads access the lock manager’s services concurrently, those methods are qualified to be synchronized.There may actually be an overkill, as it seems to be enough to synchronize the public methods (xlock/slock/unlock). An interactive client uses a single thread, but this is just to access the sleep method. There is no concurrent data access and consequently no synchronized methods.

http://www.cs.uta.fi/ University of Tampere, CS Department Server efficiency It would be more efficient to pick threads from a pool of idle threads rather than always creating a new one. The data structures are not optimized. As there typically is a large number of data items and only some unpredictable number of them is locked at any one moment, a hashing data structure might be suitable for the lock table (like HashTable, but a hash function must exist for the hash data type, e.g. Integer is ok). It would be ok to connect a lock request queue (e.g. a linked list) for each lock item in the lock table with a direct reference.

http://www.cs.uta.fi/ University of Tampere, CS Department Example lock management implementation – txn ids Currently long integers are used. Giving global txn ids sequentially (over all sites) is either complicated or needs a centralised server -> it is better to use a local (site-id, local-txn-id) pair to construct the txn ids. If the sites are known in advance, they can be numbered. We only care about txns accessing some resource manager’s services and there may be a fixed set of them -> ask the first server to talk with for an id (server- id, local-txn-id).

http://www.cs.uta.fi/ University of Tampere, CS Department Final notes on the software This is demo software. It is not bug free. If/once you find bugs, I am glad to hear. Even better, if you fix them It is not robust (error management is far from perfect). However, it may be able to run some time without errors If you for any reason (and I can not think of any) need to use the software, I can give you an official license to use it.

http://www.cs.uta.fi/ University of Tampere, CS Department Distributed Deadlock Management

http://www.cs.uta.fi/ University of Tampere, CS Department Deadlock - Introdcution Centralised example: –T1 locks X on time t 1. –T2 locks Y on time t 2 >t 1. –T1 attempts to lock Y on time t 3 >t 2 and gets blocked. –T2 attempts to X on time t 4 >t 3 and gets blocked.

http://www.cs.uta.fi/ University of Tampere, CS Department Deadlock (continued) Deadlock can occur in centralised systems. For example: –At the operating system level there can be resource contention between processes –At the transaction processing level there can be data contention between transactions. –In a poorly-designed multithread program, there can be deadlock between threads in the same process.

http://www.cs.uta.fi/ University of Tampere, CS Department Distributed deadlock ”management” approaches The approach taken by distributed systems designers to the problem of deadlock depends on the frequency with which it occurs. Possible strategies: –Ignore it. –Detection (and recovery). –Prevention and Avoidance (by statically making deadlock structurally impossible and by allocating resources carefully). –Detect local deadlock and ignore global deadlock.

http://www.cs.uta.fi/ University of Tampere, CS Department Ignore deadlocks? If the system ignores the deadlocks, then the application programmers have to make their applications in such a way, that a timeout will force the transaction to abort and possibly re-start. The same approach is sometimes used in the centralised world.

http://www.cs.uta.fi/ University of Tampere, CS Department Distributed Deadlock Prevention and Avoidance Some proposed techniques are not feasible in practice, like making a process request all of its resources at the start of execution. For transaction processing systems with timestamps, the following scheme can be implemented (like in centralised world): –When a process blocks, its timestamp is compared to the timestamp of the blocking process. –The blocked process is only allowed to wait if it has a higher timestamp. –This avoids any cyclic dependency.

http://www.cs.uta.fi/ University of Tampere, CS Department Wound-wait and wound-die In the wound-wait approach, if an older process requests a lock to an item held by a younger process, it wounds the younger process and effectively kills it. In the wait-die approach, if a younger process requests a lock to an item held by an older process, the younger process commits suicide. Both of these approaches kill transactions blindly. There does not need to be a deadlock.

http://www.cs.uta.fi/ University of Tampere, CS Department Further considerations for wound-wait and wound-die To reduce the number of unnecessarily aborted transactions, it is possible to use the cautious waiting rule: –”You are always allowed to wait for a process, which is not waiting for another process.” The aborted processes are re-started with their original timestamps, to guarantee liveness. Otherwise, a transaction may not make progress if it gets aborted over and over again.

http://www.cs.uta.fi/ University of Tampere, CS Department Local waits-for graphs Each resource manager can maintain its local `waits-for' graph. A coordinator maintains a global waits-for graph. It can be computed when requested by someone, or automatically at times. When the coordinator detects a deadlock, it selects a victim process and kills it (thereby causing its resource locks to be released), breaking the deadlock.

http://www.cs.uta.fi/ University of Tampere, CS Department Waits-for graph Centralised: a cycle in the waits-for graph means a deadlock T1 T3 T2

http://www.cs.uta.fi/ University of Tampere, CS Department Local vs Global Distributed: collect the local graphs and create a global waits-for graph. T1 T3 T1 T3 T2 Site 1 T1 T2 Site 2 T3 Global waits-for graph

http://www.cs.uta.fi/ University of Tampere, CS Department Local waits-for graphs – problems It is impossible to collect all information at the same time, thus reflecting a ”global state” of locking. Since information is collected from ”different times”, it may be that locks are unlocked in between and the global waits-for graph is wrong. Therefore, the information about changes must be transmitted to the coordinator. The coordinator knows nothing about change information that is in transit. In practice, many of the deadlocks that it thinks it has detected will be what they call in the trade `phantom deadlocks'.

http://www.cs.uta.fi/ University of Tampere, CS Department Global states and snapshot computation

http://www.cs.uta.fi/ University of Tampere, CS Department Global state We assume that between each pair (S i,S j ) of sites there is a reliable order-preserving communication channel C{i,j}, whose contents is a list of messages L{i,j}=m 1 (i,j),m 2 (i,j),..., m k (i,j). Let L={L{i,j}} be the collection of all message lists and let K be the collection of all local states. We say that the pair G=(K,L) is the global state of the system.

http://www.cs.uta.fi/ University of Tampere, CS Department Consistent cut We say that a global state G=(K,L) is a consistent cut, if for each event e in G, G contains all events e’ such that e’ < H e. That is, there are no events missing from G such that they have happened before e, if e is in G.

http://www.cs.uta.fi/ University of Tampere, CS Department Which lines cut a consistent cut? TT’ T’’ m1m1 m3m3 m2m2 T’’’ m5m5 m4m4 m6m6 m7m7 m9m9 m8m8

http://www.cs.uta.fi/ University of Tampere, CS Department Distributed snapshots A state of a site at one time is called a snapshot. There is no way we can take the snapshots simultaneously. If we could, that would solve the deadlock detection. Therefore, we want to create a snapshot that reflects a consistent cut.

http://www.cs.uta.fi/ University of Tampere, CS Department Computing a distributed snapshot – assumptions If we form a graph with sites as nodes and communication channels as edges, we assume that the graph is connected. Neither channels nor processes fail. Any site may initiate snapshot computation. There may be several simultaneous snapshot computations.

http://www.cs.uta.fi/ University of Tampere, CS Department Computing a distributed snapshot / 1 As a site S j initiates snapshot collection, it records its state and sends snapshot token to all sites it communicates with.

http://www.cs.uta.fi/ University of Tampere, CS Department Computing a distributed snapshot / 2 If a site S i, i  j, receives a snapshot token for the first time, and it receives it from S k, it does the following: 1. Stops processing messages. 2. Makes L(k,i) the empty list. 3. Records its state. 4. Sends a snapshot token to all sites it communicates with. 5. Continues to process messages.

http://www.cs.uta.fi/ University of Tampere, CS Department Computing a distributed snapshot / 3 If a site S i receives a snapshot token from S k and it has received a snaphost token also earlier on or i = j, then the list L(k,i) is the list of messages S i has received from S k after recording its state.

http://www.cs.uta.fi/ University of Tampere, CS Department Distributed snapshot example S1S1 S2S2 S3S3 snap m3m3 m2m2 S4S4 m5m5 m4m4 m1m1 Assume the connection set { (S 1,S 2 ), (S 2,S 3 ), (S 3,S 4 ) }. snap L(1,2) = { m 1 }, L(3,2) = {}, L(4,3) = { m 3, m 4 }. Effects of m 2 are included in the state of S 4. Message m 5 takes place entirely after the snapshot computation.

http://www.cs.uta.fi/ University of Tampere, CS Department Termination of the snapshot algorithm Termination is, in fact, straightforward to see. Since the graph of sites is connected, and the local processing and message delivery between any two sites takes only a finite time, all sites will be reached and processed within a finite time.

http://www.cs.uta.fi/ University of Tampere, CS Department The snapshot algorithm selects a consistent cut Proof. Let ei and ej be events, which occurred on sites S i and S j and e j by which e j < H e i. By the way snap markers are sent and received, a snap marker message has reached S j before each m 1,m 2,….,m k and S i has therefore recorded its state before e j. Therefore, e i is not in the cut. This is a contradiction, and we are done.

http://www.cs.uta.fi/ University of Tampere, CS Department A picture relating to the theorem SiSi SjSj m1m1 m2m2 mkmk ejej eiei e j < H e i Assume for contradiction -> … Conclusion: S i must have recorded its state before e j and e i can not be in the cut - contradiction.

http://www.cs.uta.fi/ University of Tampere, CS Department Several simultaneous snapshot computations The snapshot markers have to be different and sites just manage each separate snapshot computation separately.

http://www.cs.uta.fi/ University of Tampere, CS Department Snapshots and deadlock detection Apparently, in this case the state to be recorded is the waits-for graph and lock request / lock release messages. After the snapshot computation, the waits-for graphs are collected to a site, say, the initiator of the snapshot computation. For this, the snap messages should include the initiator of the snapshot computation.

University of Tampere, CS Department Distributed Transaction Management Jyrki Nummenmaa

Similar presentations

Presentation on theme: "University of Tampere, CS Department Distributed Transaction Management Jyrki Nummenmaa"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

University of Tampere, CS Department Distributed Transaction Management Jyrki Nummenmaa

Similar presentations

Presentation on theme: "University of Tampere, CS Department Distributed Transaction Management Jyrki Nummenmaa"— Presentation transcript:

Similar presentations

About project

Feedback