Massively Distributed Database Systems - Transaction management Spring 2014 Ki-Joune Li Pusan National University.

Massively Distributed Database Systems - Transaction management Spring 2014 Ki-Joune Li http://isel.cs.pusan.ac.kr/~lik Pusan National University

Basic Concepts of Transaction Transaction A set of operations Atomic : All or Nothing : Consistent State of Database Example : Flight Reservation Cf. Partially Done : Inconsistent State 2

Transaction States Active Partially Committed FailedAborted Committed the initial state; the transaction stays in this state while it is executing after the discovery that normal execution can no longer proceed. after the transaction has been rolled back and the database restored to its state prior to the start of the transaction. - restart the transaction or - kill the transaction ALL NOTHING 3

4 Transition between consistent states Transaction Example : Flight Reservation Consistent State Consistent State Set of operations BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi; END_TRANSACTION (a) BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi full => ABORT_TRANSACTION (b)

ACID Properties Atomicity. All or Nothing Not Partially Done Example : Failure in Flight Reservation Consistency. Execution of a transaction preserves the consistency of the database. State 1 State 2 All Nothing State 2’ Partially Done Consistent 5

ACID Properties Isolation. Although multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing transactions. Intermediate transaction results must be hidden from other concurrently executed transactions. Durability. After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures. DB Transation 1 Transation 2 No Effect 6

Example Transaction : Transfer $50 from account A to account B: 1.read(A) 2.A := A – 50 3.write(A) 4.read(B) 5.B := B + 50 6.write(B) Consistency requirement the sum of A and B is unchanged after the transaction. Atomicity requirement Durability Isolation 7

Example : Concurrent Execution Two Transactions T 1 : transfer $50 from A to B, T 2 transfer 10% of the balance from A to B Serial ScheduleConcurrent Schedule 8

Serializability What happens after these transactions ? Serial Schedule : Always Correct T1  T2 and T2  T1 Concurrent Schedule Serializable if Result (T1 || T2) = Result(T1  T2) or Result(T2  T1) 9

Transaction Management Guarantee ACID Properties of Transaction by Concurrency Control : Isolation and Consistency Recovery : Atomicity and Durability 10

Transaction management: Concurrency Control

Serializability For given transactions T 1, T 2,.., T n, Schedule (History) S is serializable if Result(S)  Result(S k ) where S k is a serial excution schedule. Note that Result(S i ) may be different from Result(S j ) (i  j ) How to detect whether S is serializable Conflict Graph 12

Conflict Graph T1T1 T2T2 r(a)r(a) w(a)w(a) affects r(b)r(b) w(b)w(b) S1S1 S1S1 T1T1 T2T2 r(a)r(a) w(a)w(a) r(b)r(b) w(b)w(b) S2S2 S2S2 Res(S 1 )  Res( (T 1, T 2 ) ) Res(S 1 )  Res( (T 1, T 2 ) ) Res(S 1 )  Res( (T 2, T 1 ) ) 13

Detect Cycle in Conflict Graph T1T1 T2T2 r(a)r(a) w(a)w(a) affects r(b)r(b) w(b)w(b) T2T2 T2T2 T1T1 T1T1 If  Cycle in Conflict Graph  Then Not Serializable  Otherwise Serializable 14

How to make it serializable Control the order of execution of operations in concurrent transactions. Two Approaches Two Phase Locking Protocol Locking on each operation Timestamping : Ordering by timestamp on each transaction and each operation 15

Lock-Based Protocols A lock mechanism to control concurrent access to a data item Data items can be locked in two modes : Exclusive (X) mode : Data item can be both read as well as written. X-lock is requested using lock-X instruction. Shared (S) mode : Data item can only be read. S-lock is requested using lock-S instruction. Lock requests are made to concurrency-control manager. Transaction can proceed only after request is granted. 16

Lock-Based Protocol Lock-compatibility matrix A transaction may be granted a lock on an item if the requested lock is compatible with locks already held Any number of transactions can hold shared locks If a lock cannot be granted, the requesting transaction is made to wait till all incompatible locks held have been released. the lock is then granted. 17

Lock-Based Protocol in Distributed DBS Majority Protocol Local lock manager at each site administers lock and unlock requests for data items stored at that site. When a transaction wishes to lock an un replicated data item Q residing at site S i, a message is sent to S i ‘s lock manager. If Q is locked in an incompatible mode, then the request is delayed until it can be granted. When the lock request can be granted, the lock manager sends a message back to the initiator indicating that the lock request has been granted. If Q is replicated at n sites, then a lock request message must be sent to more than half of the n sites in which Q is stored. The transaction does not operate on Q until it has obtained a lock on a majority of the replicas of Q. When writing the data item, transaction performs writes on all replicas. 18

The Two-Phase Locking Protocol This is a protocol which ensures conflict-serializable schedules. Phase 1: Growing Phase transaction may obtain locks transaction may not release locks Phase 2: Shrinking Phase transaction may release locks transaction may not obtain locks The protocol assures serializability 19

20 Two Phase Locking 2PL Strict 2PL

Problem of Two Phase Locking Protocol Deadlock Growing Phase and Shrinking Phase Prevention and Avoidance : Impossible Only Detection may be possible When a deadlock occurs Detection of Deadlock : Wait-For-Graph Abort a transaction How to choose a transaction to kill ? 21

Timestamp-Based Protocols Each transaction is issued a timestamp when TS(T i ) <TS(T j ) : old transaction T i and new transaction T j Each data Q, two timestamp : W-timestamp(Q) : largest time-stamp for successful write(Q) R-timestamp(Q) : largest time-stamp for successful read(Q) 22

Timestamp-Based Protocols : Read Transaction T i issues a read(Q) If TS(T i )  W-timestamp(Q), then T i needs to read a value of Q that was already overwritten. Hence, the read operation is rejected, and T i is rolled back. If TS(T i )  W-timestamp(Q), then the read operation is executed, and R-timestamp(Q) is set set 23

Timestamp-Based Protocols : Write Transaction T i issues write(Q). If TS(T i ) < R-timestamp(Q), then the value of Q that T i is producing was needed previously, and the system assumed that that value would never be produced. Hence, the write operation is rejected, and T i is rolled back. If TS(T i ) < W-timestamp(Q), then T i is attempting to write an obsolete value of Q. Hence, this write operation is rejected, and T i is rolled back. Otherwise, the write operation is executed, and W-timestamp(Q) is reset 24

25 How to manage global TS – Clock In distributed systems Multiple Physical Clocks : No Centralized Clock Logical Clock rather than a Global Physical Clock

26 Global Clock and Logical Clock Global Clock TAI (Temps Atomique International) at Paris Time Server Broadcasting from Satellite Granularity Logical Clock Not absolute time but for ordering Lamport Algorithm Correction of Clock T(A, tx) < T(B, rx)

Logical Clock Not absolute time but for ordering Lamport Algorithm Correction of Clock – Never run backward C(A, tx) < C(B, rx) 1 2 3 4 5 5 6 7 8 9 10 11 12 13 14 6 7 8 15 16 17 10 11 12 9 10 13 14 18 19 1 2 3 4 5 5 6 7 8 9 10 11 12 13 14 6 7 8 15 16 17 10 16 17 18 19 18 19 18 19 27

Implementation of Concurrency Control in Distributed Systems Three Managers TM (Transaction Manager) : Ensure the Atomicity Scheduler : Main Responsibility for Concurrency Control DM (Data Manager) : Simple Read/Write In a single machine In distributed systems 28

Transaction Management: Recovery

Failure Classification Transaction failure : Logical errors: internal error condition System errors: system error condition (e.g., deadlock) System crash: a power failure or other hardware or software failure Fail-stop assumption: non-volatile storage contents are assumed to not be corrupted by system crash Database systems have numerous integrity checks to prevent corruption of disk data Disk failure 30

Recovery Algorithms Recovery algorithms : should ensure database consistency transaction atomicity and durability despite failures Recovery algorithms have two parts 1.Preparing Information for Recovery : During normal transaction 2.Actions taken after a failure to recover the database 31

Storage Structure Volatile storage: does not survive system crashes examples: main memory, cache memory Nonvolatile storage: survives system crashes examples: disk, tape, flash memory, non-volatile (battery backed up) RAM Stable storage: a mythical form of storage that survives all failures approximated by maintaining multiple copies on distinct nonvolatile media 32

Recovery and Atomicity Modifying the database must be committed Otherwise it may leave the database in an inconsistent state. Example Consider transaction T i that transfers $50 from account A to account B; goal is either to perform all database modifications made by T i or none at all. Several output operations may be required for T i For example : output(A) and output(B). A failure may occur after one of these modifications have been made but before all of them are made. 33

Recovery and Atomicity (Cont.) To ensure atomicity despite failures, we first output information describing the modifications to stable storage without modifying the database itself. Log-based recovery 34

Log-Based Recovery A log : must be kept on stable storage., and Logging Method When transaction T i starts, log record When T i finishes, log record Before T i executes write(X), log record We assume for now that log records are written directly to stable storage Two approaches using logs Deferred database modification Immediate database modification 35

Deferred Database Modification The deferred database modification scheme records all modifications to the log, writes them after commit. Log Scheme Transaction starts by writing record to log. write(X) : Note: old value is not needed for this scheme The write is not performed on X at this time, but is deferred. When T i commits, is written to the log Finally, executes the previously deferred writes. 36

Deferred Database Modification (Cont.) Recovering Method During recovery after a crash, a transaction needs to be redone if and only if both and are there in the log. Redoing a transaction T i ( redoT i ) sets the value of all data items updated by the transaction to the new values. Deletes T i such that exists but does not. 37

Deferred Database Modification : Example If log on stable storage at time of crash is as in case: (a) No redo actions need to be taken (b) redo(T 0 ) must be performed since is present (c) redo(T 0 ) must be performed followed by redo(T 1 ) since and are present T0: read (A)T1 : read (C) A: = A - 50 C:= C- 100 write (A) write (C) read (B) B:= B + 50 write (B) 38

Immediate Database Modification Immediate database modification scheme Database updates of an uncommitted transaction For undoing : both old value and new value Recovery procedure has two operations undo(T i ) : restores the value of all data items updated by T i redo(T i ) : sets the value of all data items updated by T i When recovering after failure: Undo if the log, but not. Redo if the log both the record and. 39

Immediate Database Modification : Example Recovery actions in each case above are: (a) undo (T 0 ): B is restored to 2000 and A to 1000. (b) undo (T 1 ) and redo (T 0 ): C is restored to 700, and then A and B are set to 950 and 2050 respectively. (c) redo (T 0 ) and redo (T 1 ): A and B are set to 950 and 2050 respectively. Then C is set to 600 40

Idempotent Operation Result (Op(x)) = Result (Op(Op(x)) Example Increment(x); : Not Idempotent x=a; write(x); : Idempotent Operations in Log Record Must be Idempotent, otherwise Multiple Executions (for redo) may cause incorrect results 41

Mutual Exclusions and Election algorithms

Mutual Exclusion : Monitor (Coordinator) In a single system : Easy to implement by semaphore In distributed systems : No shared data for semaphore A Centralized Algorithm Simple But single point of failure 43

44 Mutual Exclusion : Distributed Algorithm No central coordinator Rule : When a system receive a request Not in CS or Not to enter : OK to requester In CS : No reply To enter : compare Timestamp and one with lower TS wins

45 Mutual Exclusion : Token Ring One with token can enter CS If a system wants to enter CS : wait Token and process it Otherwise, pass token to the next Token Wait until it gets the token

46 Mutual Exclusion : Comparison # of messages per request delay per request Problem Monitor32 Crash of monitor Distributed Algorithm 2(n-1) n points of Crash Token Ring0 to n-1 Token lost

47 Election : Bully Algorithm When it is found that a coordinator is required

When it is found that the coordinator has crashed, Circulate message with priority 48 Election : Ring Algorithm 2 3 45 6 7 8 1 Site 3 finds that the coordinator has crashed Elect [3] Elect [3,4] Elect [3,4,5] Elect [3,4,5,7] Elect [3,4,5,7,8] Elect [3,4,5,7,8,2] No response Elected [3]

Massively Distributed Database Systems - Transaction management Spring 2014 Ki-Joune Li Pusan National University.

Similar presentations

Presentation on theme: "Massively Distributed Database Systems - Transaction management Spring 2014 Ki-Joune Li Pusan National University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Massively Distributed Database Systems - Transaction management Spring 2014 Ki-Joune Li Pusan National University.

Similar presentations

Presentation on theme: "Massively Distributed Database Systems - Transaction management Spring 2014 Ki-Joune Li Pusan National University."— Presentation transcript:

Similar presentations

About project

Feedback