Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transactions and Their Distribution Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems October 20, 2015.

Similar presentations


Presentation on theme: "Transactions and Their Distribution Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems October 20, 2015."— Presentation transcript:

1 Transactions and Their Distribution Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems October 20, 2015

2 Administrivia  Please read PNUTS paper for discussion next week  Upcoming schedule:  lectures April 14, 19  no class April 21  midterm April 26  Project demo signups will be done online, for finals week 2

3 3 Recall: ACID Semantics  Atomicity: operations are atomic, either committing or aborting as a single entity  Consistency: the state of the data is internally consistent  Isolation: all operations act as if they were run by themselves  Durability: all writes stay persistent!

4 4 Providing Atomicity and Consistency  Database systems provide transactions with the ability to abort a transaction upon some failure condition  Based on transaction logging – record all operations and undo them as necessary  Database systems also use the log to perform recovery from crashes  Undo all of the steps in a partially-complete transaction  Then redo them in their entirety  This is part of a protocol called ARIES  These can be the basis of persistent storage, and we can use middleware like J2EE to build distributed transactions with the ability to abort database operations if necessary

5 5 The Need for Isolation  Suppose eBay seller S has a bank account that we’re depositing money into, as people buy:  What if two purchases occur simultaneously, from two different servers on different continents? S = Accounts.Get(1234) Write S.bal = S.bal + $50

6 6 Concurrent Deposits  This update code is represented as a sequence of read and write operations on “data items” (which for now should be thought of as individual accounts): where S is the data item representing the seller’s account # 1234 Deposit 1 Deposit 2 Read(S.bal) S.bal := S.bal + $50 S.bal:= S.bal + €10 Write(S.bal)

7 7 A “Bad” Concurrent Execution Only one action (e.g. a read or a write) can actually happen at a time for a given database, and we can interleave deposit operations in many ways: Deposit 1 Deposit 2 Read(S.bal) S.bal := S.bal + $50 S.bal:= S.bal + €10 Write(S.bal) time BAD!

8 8 A “Good” Execution  Previous execution would have been fine if the accounts were different (i.e. one were S and one were T), i.e., transactions were independent  The following execution is a serial execution, and executes one transaction after the other: Deposit 1 Deposit 2 Read(S.bal) S.bal := S.bal + $50 write(S.bal) Read(S.bal) S.bal:= S.bal + $10 Write(S.bal) time GOOD!

9 9 Good Executions  An execution is “good” if it is serial (transactions are executed atomically and consecutively) or serializable (i.e. equivalent to some serial execution)  Equivalent to executing Deposit 1 then 3, or vice versa  Why would we want to do this instead? Deposit 1 Deposit 3 read(S.bal) read(T.bal) S.bal := S.bal + $50 T.bal:= T.bal + €10 write(S.bal) write(T.bal)

10 10 Concurrency Control  A means of ensuring that transactions are serializable  There are many methods, of which we’ll see one  Lock-based concurrency control (2-phase locking)  Optimistic concurrency control (no locks – based on timestamps)  Multiversion CC  …

11 Lock-Based Concurrency Control  Strict Two-phase Locking (Strict 2PL) Protocol:  Each transaction must obtain:  a S (shared) lock on object before reading  an X (exclusive) lock on object before writing  An owner of an S lock can upgrade it to X if no one else is holding the lock  All locks held by a transaction are released when the transaction completes  Locks are handled in a “growing” phase, then a “shrinking” phase  (Non-strict) 2PL Variant: Release locks anytime, but cannot acquire locks after releasing any lock.

12 12 Benefits of Strict 2PL  Strict 2PL allows only serializable schedules.  Additionally, it simplifies transaction aborts  (Non-strict) 2PL also allows only serializable schedules, but involves more complex abort processing

13 Aborting a Transaction  If a transaction T i is aborted, all its actions have to be undone  Not only that, if T j reads an object last written by T i, T j must be aborted as well!  Most systems avoid such cascading aborts by releasing a transaction’s locks only at commit time  If T i writes an object, T j can read this only after T i commits  Actions are undone by consulting the transaction log mentioned earlier

14 The Transaction Log  The following actions are recorded in the log:  T i writes an object: the old value and the new value  Log record must go to disk before the changed page does!  T i commits/aborts: a log record indicating this action  Log records are chained together by transaction id, so it’s easy to undo a specific transaction  Log is often mirrored and archived on stable storage

15 Another Benefit of the Log: Recovering From a Crash  3 phases in the ARIES recovery algorithm:  Analysis  Scan the log forward (from the most recent checkpoint) to identify all pending transactions, unwritten pages  Redo  Redo all updates to unwritten pages in the buffer pool, to ensure that all logged updates are in fact carried out and written to disk  Undo  Undo all writes done by incomplete transactions by working backwards in the log  (Care must be taken to handle the case of a crash occurring during the recovery process!)

16 A Danger with Locks: Deadlocks  Deadlock: Cycle of transactions waiting for locks to be released by each other  Two ways of dealing with deadlocks:  Deadlock prevention  Deadlock detection

17 Deadlock Prevention  Assign priorities based on timestamps (older = higher)  Assume T i wants a lock that T j holds  Do one of:  Wait-Die: If T i has higher priority, T i waits for T j ; otherwise T i aborts  Wound-wait: If T i has higher priority, T j aborts; otherwise T i waits  Higher-priority transactions never wait for lower-priority  If a transaction re-starts, make sure it has its original timestamp  Keeps it from always getting aborted!

18 18 Database Transactions and Concurrency Control, Summarized The basic goal was to guarantee ACID properties  Transactions and logging provide Atomicity and Consistency  Locks ensure Isolation  The transaction log (and RAID, backups, etc.) are also used to ensure Durability So far, we’ve been in the realm of databases – how does this extend to the distributed context?

19 19 Distributed Transactions  We generally rely on a middleware layer called application servers, aka TP monitors, to provide transactions across systems  Tuxedo, iPlanet, WebSphere, etc.  For atomicity, two-phase commit protocol  For isolation, need distributed concurrency control DB Transact Server Transact Server Workflow Controller Msg Queue Web Server App Server Client

20 Two-Phase Commit (2PC)  Site at which a transaction originates is the coordinator; other sites at which it executes are subordinates  Two rounds of communication, initiated by coordinator:  Voting  Coordinator sends prepare messages, waits for yes or no votes  Then, decision or termination  Coordinator sends commit or rollback messages, waits for acks  Any site can decide to abort a transaction!

21 21 Steps in 2PC When a transaction wants to commit:  Coordinator sends prepare message to each subordinate  Subordinate force-writes an abort or prepare log record and then sends a no (abort) or yes (prepare) message to coordinator  Coordinator considers votes:  If unanimous yes votes, force-writes a commit log record and sends commit message to all subordinates  Else, force-writes abort log rec, and sends abort message  Subordinates force-write abort/commit log records based on message they get, then send ack message to coordinator  Coordinator writes end log record after getting all acks

22 22 Illustration of 2PC CoordinatorSubordinate 1Subordinate 2 force-write begin log entry force-write prepared log entry force-write prepared log entry send “prepare” send “yes” force-write commit log entry send “commit” force-write commit log entry force-write commit log entry send “ack” write end log entry

23 Comments on 2PC  Every message reflects a decision by the sender; to ensure that this decision survives failures, it is first recorded in the local log  All log records for a transaction contain its ID and the coordinator’s ID  The coordinator’s abort/commit record also includes IDs of all subordinates  Thm: there exists no distributed commit protocol that can recover without communicating with other processes, in the presence of multiple failures!

24 What if a Site Fails in the Middle?  If we have a commit or abort log record for transaction T, but not an end record, we must redo/undo T  If this site is the coordinator for T, keep sending commit/abort msgs to subordinates until acks have been received  If we have a prepare log record for transaction T, but not commit/abort, this site is a subordinate for T  Repeatedly contact the coordinator to find status of T, then write commit/abort log record; redo/undo T; and write end log record  If we don’t have even a prepare log record for T, unilaterally abort and undo T  This site may be coordinator! If so, subordinates may send messages and need to also be undone

25 Blocking for the Coordinator  If coordinator for transaction T fails, subordinates who have voted yes cannot decide whether to commit or abort T until coordinator recovers  T is blocked  Even if all subordinates know each other (extra overhead in prepare msg) they are blocked unless one of them voted no

26 Link and Remote Site Failures  If a remote site does not respond during the commit protocol for transaction T, either because the site failed or the link failed:  If the current site is the coordinator for T, should abort T  If the current site is a subordinate, and has not yet voted yes, it should abort T  If the current site is a subordinate and has voted yes, it is blocked until the coordinator responds!

27 Observations on 2PC  Ack msgs used to let coordinator know when it’s done with a transaction; until it receives all acks, it must keep T in the transaction-pending table  If the coordinator fails after sending prepare msgs but before writing commit/abort log recs, when it comes back up it aborts the transaction

28 28 From Distributed Commits to Distributed Concurrency Control  What we saw were the steps involved in preserving atomicity and consistency in a distributed fashion  Let’s briefly look at distributed isolation (locking)…

29 Distributed Locking How do we manage locks across many sites?  Centralized: One site does all locking  Vulnerable to single site failure  Primary Copy: All locking for an object done at the primary copy site for this object  Reading requires access to locking site as well as site where the object is stored  We’ll see how this is used in PNUTS  Fully Distributed: Locking for a copy done at site where the copy is stored  Locks at all sites holding the object being written

30 Distributed Deadlock Detection  Each site maintains a local waits-for graph  A global deadlock might exist even if the local graphs contain no cycles: T1 T2 SITE ASITE BGLOBAL Three solutions:  Centralized (send all local graphs to one site)  Hierarchical (organize sites into a hierarchy and send local graphs to parent in the hierarchy)  Timeout (abort transaction if it waits too long)

31 31 Summary of Transactions and Concurrency  There are many (especially monetary) transfers that need atomicity and isolation  Transactions and concurrency control provide these features  In a distributed, 3-tier setting they run in an Application Server  Similar features are provided in a 2-tier setting for applications that run directly in the DBMS  Two-phase locking ensures isolation  Two-phase commit is a voting scheme for doing distributed commit


Download ppt "Transactions and Their Distribution Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems October 20, 2015."

Similar presentations


Ads by Google