CSL 771: Database Implementation Transaction Processing

Slides:



Advertisements
Similar presentations
1 Term 2, 2004, Lecture 6, TransactionsMarian Ursu, Department of Computing, Goldsmiths College Transactions 3.
Advertisements

CM20145 Concurrency Control
Lecture plan Transaction processing Concurrency control
What is Concurrent Process (CP)? Multiple users access databases and use computer systems Multiple users access databases and use computer systems simultaneously.
Database Systems (資料庫系統)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 16.
Chapter 16 Concurrency. Topics in this Chapter Three Concurrency Problems Locking Deadlock Serializability Isolation Levels Intent Locking Dropping ACID.
1 Concurrency Control Chapter Conflict Serializable Schedules  Two actions are in conflict if  they operate on the same DB item,  they belong.
1 Lecture 11: Transactions: Concurrency. 2 Overview Transactions Concurrency Control Locking Transactions in SQL.
1 Integrity Ioan Despi Transactions: transaction concept, transaction state implementation of atomicity and durability concurrent executions serializability,
Transaction Management: Concurrency Control CS634 Class 17, Apr 7, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
TRANSACTION PROCESSING SYSTEM ROHIT KHOKHER. TRANSACTION RECOVERY TRANSACTION RECOVERY TRANSACTION STATES SERIALIZABILITY CONFLICT SERIALIZABILITY VIEW.
Principles of Transaction Management. Outline Transaction concepts & protocols Performance impact of concurrency control Performance tuning.
Concurrency Control Amol Deshpande CMSC424. Approach, Assumptions etc.. Approach  Guarantee conflict-serializability by allowing certain types of concurrency.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Concurrency Control Chapter 17 Sections
Concurrency Control II. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
CSC271 Database Systems Lecture # 32.
Lock-Based Concurrency Control
Lecture 11 Recoverability. 2 Serializability identifies schedules that maintain database consistency, assuming no transaction fails. Could also examine.
1 Supplemental Notes: Practical Aspects of Transactions THIS MATERIAL IS OPTIONAL.
Sekolah Tinggi Ilmu Statistik (STIS) 1 Dr. Said Mirza Pahlevi, M.Eng.
Database Systems, 8 th Edition Concurrency Control with Time Stamping Methods Assigns global unique time stamp to each transaction Produces explicit.
Concurrency Control and Recovery In real life: users access the database concurrently, and systems crash. Concurrent access to the database also improves.
10 1 Chapter 10 Transaction Management and Concurrency Control Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Transaction Management and Concurrency Control
Transaction Management and Concurrency Control
1 Concurrency Control and Recovery Module 6, Lecture 1.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
What is a Transaction? Logical unit of work
Chapter 8 : Transaction Management. u Function and importance of transactions. u Properties of transactions. u Concurrency Control – Meaning of serializability.
Transaction Management
1 Transaction Management Database recovery Concurrency control.
1 Minggu 8, Pertemuan 15 Transaction Management Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
Transaction Management Chapter 9. What is a Transaction? A logical unit of work on a database A logical unit of work on a database An entire program An.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 10 Transaction Management.
Chapterb19 Transaction Management Transaction: An action, or series of actions, carried out by a single user or application program, which reads or updates.
Databases Illuminated
Lecture 12 Recoverability and failure. 2 Optimistic Techniques Based on assumption that conflict is rare and more efficient to let transactions proceed.
1 Transaction Management Overview Chapter Transactions  Concurrent execution of user programs is essential for good DBMS performance.  Because.
ICS (072)Concurrency Control Techniques1 Concurrency Control Techniques Chapter 18 Dr. Muhammad Shafique.
Chapter 11 Concurrency Control. Lock-Based Protocols  A lock is a mechanism to control concurrent access to a data item  Data items can be locked in.
Transactions CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems by Connolly & Begg, © Addison Wesley 2002)
TRANSACTION MANAGEMENT R.SARAVANAKUAMR. S.NAVEEN..
Concurrency Control in Database Operating Systems.
1 Concurrency Control II: Locking and Isolation Levels.
II.I Selected Database Issues: 2 - Transaction ManagementSlide 1/20 1 II. Selected Database Issues Part 2: Transaction Management Lecture 4 Lecturer: Chris.
Chapter 10 Recovery System. ACID Properties  Atomicity. Either all operations of the transaction are properly reflected in the database or none are.
1 Concurrency Control Lecture 22 Ramakrishnan - Chapter 19.
Transaction Management Overview. Transactions Concurrent execution of user programs is essential for good DBMS performance. – Because disk accesses are.
Transaction Management Transparencies. ©Pearson Education 2009 Chapter 14 - Objectives Function and importance of transactions. Properties of transactions.
9 1 Chapter 9_B Concurrency Control Database Systems: Design, Implementation, and Management, Rob and Coronel.
NOEA/IT - FEN: Databases/Transactions1 Transactions ACID Concurrency Control.
Multidatabase Transaction Management COP5711. Multidatabase Transaction Management Outline Review - Transaction Processing Multidatabase Transaction Management.
10 1 Chapter 10_B Concurrency Control Database Systems: Design, Implementation, and Management, Rob and Coronel.
Chapter 13 Managing Transactions and Concurrency Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition.
1 Concurrency Control. 2 Why Have Concurrent Processes? v Better transaction throughput, response time v Done via better utilization of resources: –While.
Transaction Management
Transaction Management and Concurrency Control
Concurrency Control.
Part- A Transaction Management
Transaction Management Transparencies
Transaction Properties
Chapter 10 Transaction Management and Concurrency Control
Lecture 21: Concurrency & Locking
Chapter 15 : Concurrency Control
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Transaction management
Transaction Management
CONCURRENCY Concurrency is the tendency for different tasks to happen at the same time in a system ( mostly interacting with each other ) .   Parallel.
Presentation transcript:

CSL 771: Database Implementation Transaction Processing Maya Ramanath All material (including figures) from: Concurrency Control and Recovery in Database Systems Phil Bernstein, Vassos Hadzilacos and Nathan Goodman (http://research.microsoft.com/en-us/people/philbe/ccontrol.aspx)

A transaction is a unit of interaction Transactions Interaction with the DBMS through SQL update Airlines set price = price - price*0.1, status = “cheap” where price < 5000 A transaction is a unit of interaction

Database system must ensure ACID properties Atomicity Consistency Isolation Durability Database system must ensure ACID properties

Atomicity and Consistency Single transaction Execution of a transaction: “all-or-nothing” Either a transaction completes in its entirety Or it “does not even start” As if the transaction never existed No partial effect must be visible 2 outcomes: A transaction COMMITs or ABORTs

Consistency and Isolation Multiple transactions Concurrent execution can cause an inconsistent database state Each transaction executed as if isolated from the others

Durability If a transaction commits the effects are permanent But, durability has a bigger scope Catastrophic failures (floods, fires, earthquakes)

What we will study… Concurrency Control Recovery Ensuring atomicity, consistency and isolation when multiple transactions are executed concurrently Recovery Ensuring durability and consistency in case of software/hardware failures

Terminology Data item Read (x) Write (x, 5) Start (T) Commit (T) A tuple, table, block Read (x) Write (x, 5) Start (T) Commit (T) Abort (T) Active Transaction A transaction which has neither committed nor aborted Commit: guarantees that the transaction will not be aborted! (Bank transaction, for example) Abort: can occur because the transaction itself encounters an error condition, or by system failures beyond its control. The DB itself may abort for some reason.

High level model Transaction Manager Scheduler Recovery Manager Transaction n Transaction Manager Scheduler Disk Recovery Manager Cache Manager

Recoverability (1/2) Transaction T Aborts DBMS has to… T wrote some data items T’ read items that T wrote DBMS has to… Undo the effects of T Undo effects of T’ But, T’ has already committed T T’ Read (x) Write (x, k) Read (y) Write (y, k’) Commit Abort

Ti commits after all Tk commit Recoverability (2/2) Let T1,…,Tn be a set of transactions Ti reads a value written by Tk, k < i An execution of transactions is recoverable if Ti commits after all Tk commit T1 T2 Write (x,2) Read (x) Write (y,2) Commit T1 T2 Write (x,2) Read (x) Write (y,2) Commit

Cascading Aborts (1/2) Because T was aborted, T1,…, Tk also have to be aborted T T’ T’’ Read (x) Write (x, k) Read (y) Write (y, k’) Abort

Cascading Aborts (2/2) Recoverable executions do not prevent cascading aborts How can we prevent them then ? T1 T2 Write (x,2) Read (x) Write (y,2) Commit T1 T2 Write (x,2) Commit Read (x) Write (y,2)

What we learnt so far… Reading a value, committing a transaction Recoverable with cascading aborts Recoverable without cascading aborts Not recoverable T1 T2 Write (x,2) Read (x) Write (y,2) Commit T1 T2 Write (x,2) Read (x) Write (y,2) Commit T1 T2 Write (x,2) Commit Read (x) Write (y,2)

Strict Schedule (1/2) “Undo”-ing the effects of a transaction Restore the before image of the data item T1 T2 Write (x,1) Write (y,3) Write (y,1) Commit Read (x) Abort T1 T2 Write (x,1) Write (y,3) Commit Equivalent to Final value of y: 3

Strict Schedule (2/2) Initial value of x: 1 T1 T2 Write (x,2) Write (x,3) Abort T1 T2 Write (x,2) Write (x,3) Abort T1 T2 Write (x,2) Abort Write (x,3) Should x be restored to 1 or 3? T1 restores x to 3? T2 restores x to 2? Do not read or write a value which has been written by an active transaction until that transaction has committed or aborted

The Lost Update Problem Read (x) Write (x, 200,000) Commit Write (x, 200) Assume x is your account balance

Serializable Schedules Serial schedule Simply execute transactions one after the other A serializable schedule is one which equivalent to some serial schedule

Serializability Theory

Serializable Schedules T1: op11, op12, op13 T2: op21, op22, op23, op24 Serial schedule Simply execute transactions one after the other op11, op12, op13 op21, op22, op23, op24 op21, op22, op23, op24 op11, op12, op13 Serializable schedule Interleave operations Ensure end result is equivalent to some serial schedule

Notation r1[x] = Transaction 1, Read (x) w1[x] = Transaction 1, Write (x) c1 = Transaction 1, Commit a1= Transaction 1, Abort r1[x], r1[y], w2[x], r2[y], c1, c2

Histories (1/3) Operations of transaction T can be represented by a partial order. r1[x] r1[y] w1[z] c1

Histories (2/3) Conflicting operations Of two ops operating on the same data item, if one of them is a write, then the ops conflict An order has to be specified for conflicting operations

Histories (3/3) Complete History

Serializable Histories The goal: Ensure that the interleaving operations guarantee a serializable history. The method When are two histories equivalent? When is a history serial?

Equivalence of Histories (1/2) H ≅ H’ if they are defined over the same set of transactions and they have the same operations they order conflicting operations the same way

Equivalence of Histories (2/2) y Source: Concurrency Control and Recovery in Database Systems: Bernstein, Hadzilacos and Goodman

Serial History A complete history is serial if for every pair of transactions Ti and Tk, all operations of Ti occur before Tk OR all operations of Tk occur before Ti A history is serializable if its committed projection is equivalent to a serial history.

Serialization Graph T1 T3 T2

Serializability Theorem A history H is serializable if its serialization graph SG(H) is acyclic On your own How do recoverability, strict schedules, cascading aborts fit into the big picture?

Locking

High level model Transaction Manager Scheduler Recovery Manager Transaction n Transaction Manager Scheduler Disk Recovery Manager Cache Manager

Transaction Management . Transaction n Transaction Manager Receives Transactions Sends operations to scheduler Read1(x) Write2(y,k) Read2(x) Commit1 Scheduler Execute op Reject op Delay op Disk

Locking is used by the scheduler to ensure serializability Each data item x has a lock associated with it If T wants to access x Scheduler first acquires a lock on x Only one transaction can hold a lock on x T releases the lock after processing Locking is used by the scheduler to ensure serializability

Notation Read lock and write lock Obtaining read and write locks rl[x], wl[x] Obtaining read and write locks rli[x], wli[x] Lock table Entries of the form [x, r, Ti] Conflicting locks pli[x], qlk[y], x = y and p,q conflict Unlock rui[x], wui[x]

Basic 2-Phase Locking (2PL) RULE 1 RULE 2 pli[x] cannot be released until pi[x] is completed Receive pi[x] is qlk[x] set such that p and q conflict? NO Acquire pli[x] RULE 3 (2 Phase Rule) Once a lock is released no other locks may be obtained. YES pi[x] scheduled pi[x] delayed

The 2-phase rule Once a lock is released no other locks may be obtained. T1: r1[x] w1[y] c1 T2: w2[x] w2[y] c2 H = rl1[x] r1[x] ru1[x] wl2[x] w2[x] wl2[y] w2[y] wu2[x] wu2[y] c2 wl1[y] w1[y] wu1[y] c1 T1 T2

Correctness of 2PL 2PL always produces serializable histories Proof outline STEP 1: Characterize properties of the scheduler STEP 2: Prove that any history with these properties is serializable (That is, SG(H) is acyclic)

Deadlocks (1/2) T1: r1[x] w1[y] c1 T2: w2[y] w2[x] c2 Scheduler rl1[x] wl2[y] r1[x] w2[y] <cannot proceed>

Deadlocks (2/2) Strategies to deal with deadlocks Timeouts Leads to inefficiency Detecting deadlocks Maintain a wait-for graph, cycle indicates deadlock Once a deadlock is detected, break the cycle by aborting a transaction New problem: Starvation

Conservative 2PL Avoids deadlocks altogether T declares its readset and writeset Scheduler tries to acquire all required locks If not all locks can be acquired, T waits in a queue T never “starts” until all locks are acquired Therefore, it can never be involved in a deadlock On your own Strict 2PL (2PL which ensures only strict schedules)

Extra Information Assumption: Data items are organized in a tree Can we come up with a better (more efficient) protocol?

Tree Locking Protocol (1/3) RULE 2 if x is an intermediate node, and y is a parent of x, the ali[x] is possible only if ali[y] RULE 1 Receive ai[x] is alk[x] ? NO RULE 2 RULE 3 ali[x] cannot be released until ai[x] is completed YES pi[x] scheduled ai[x] delayed RULE 4 Once a lock is released the same lock may not be re-obtained.

Tree Locking Protocol (2/3) Proposition: If Ti locks x before Tk, then for every v which is a descendant of x, if both Ti and Tk lock v, then Ti locks v before Tk. Theorem: Tree Locking Protocol always produces Serializable Schedules

Tree Locking Protocol (3/3) Tree Locking Protocol avoids deadlock Releases locks earlier than 2PL BUT Needs to know the access pattern to be effective Transactions should access nodes from root-to-leaf

Multi-granularity Locking (1/3) Refers to the relative size of the data item Attribute, tuple, table, page, file, etc. Efficiency depends on granularity of locking Allow transactions to lock at different granularities

Multi-granularity Locking (2/3) Lock Instance Graph Explicit and Implicit Locks Intention read and intention write locks Intention locks conflict with explicit read and write locks but not with other intention locks Source: Concurrency Control and Recovery in Database Systems: Bernstein, Hadzilacos and Goodman

Multi-granularity Locking (3/3) To set rli[x] or irli[x], first hold irli[y] or iwli[y], such that y is the parent of x. To set wli[x] or iwli[x], first hold iwli[y], such that y is the parent of x. To schedule ri[x] (or wi[x]), Ti must hold rli[y] (or wli[y]) where y = x, or y is an ancestor of x. To release irli[x] (or iwli[x]) no child of x can be locked by Ti

The Phantom Problem How to lock a tuple, which (currently) does not exist? T1: r1[x1], r1[x2], r1[X], c1 T2: w[x3], w[X], c2 rl1[x1], r1[x1], rl1[x2], r1[x2], wl2[x3], wl[X], w2[x3], wu2[x3,X], c2, rl1[X], ru1[x1,x2,X], c1

Non-lock-based schedulers

Timestamp Ordering (1/3) Each transaction is associated with a timestamp Ti indicates Transaction T with timestamp i. Each operation in the transaction has the same timestamp

Timestamp Ordering (2/3) TO Rule If pi[x] and qk[x] are conflicting operations, then pi[x] is processed before qk[x] iff i < k Theorem: If H is a history representing an execution produced by a TO scheduler, then H is serializable.

Timestamp Ordering (3/3) For each data item x, maintain: max-rt(x), max-wt(x), c(x) Request ri[x] Grant request if TS (i) >= max-wt (x) and c(x), update max-rt (x) Delay if TS(i) > max-wt(x) and !c(x) Else abort and restart Ti Request wi[x] Grant request if TS (i) >= max-wt (x) and TS (i) >= max-rt (x), update max-wt (x), set c(x) = false ON YOUR OWN: Thomas write rule, actions taken when a transaction has to commit or abort

Validation Aggressively schedule all operations Do not commit until the transaction is “validated” ON YOUR OWN

Summary Lock-based Schedulers Non-lock-based Schedulers 2-Phase Locking Tree Locking Protocol Multi-granularity Locking Locking in the presence of updates Non-lock-based Schedulers Timestamp Ordering Validation-based Concurrency Control (on your own)

SOURCE: Database System: The complete book SOURCE: Database System: The complete book. Garcia-Molina, Ullman and Widom Recovery

Logging Log the operations in the transaction(s) Believe the log Does the log say transaction T has committed? Or does it say aborted? Or has only a partial trace (implicit abort)? In case of failures, reconstruct the DB from its log

The basic setup Buffer Space for each transaction Buffer Space for data and log Transactions LOG T1 The Disk T2 T3 Tk

Terminology Data item: an element which can be read or written tuple, relation, B+-tree index, etc Input x: fetch x from the disk to buffer Read x,t: read x into variable local variable t Write x,t: write value of t into x Output x: write x to disk

Example update Airlines set price = price - price*0.1, status = “cheap” where price < 5000 Read P, x x -= x* 0.1 Write x,P Read S, y y = “CHEAP” Write y, S Output P Output S System fails here System fails here System fails here

Logs Sequence of log records Need to keep track of Start of transaction Update operations (Write operations) End of transaction (COMMIT or ABORT) “Believe” the log, use the log to reconstruct a consistent DB state

All 3 logging styles ensure atomicity and durability Types of logs Undo logs Ensure that uncommitted transactions are rolled back (or undone) Redo logs Ensure that committed transactions are redone Undo/Redo logs Both of the above All 3 logging styles ensure atomicity and durability

Undo Logging (1/3) <START T>: Start of transaction T <COMMIT T> <ABORT T> <T, A, x>: Transaction T modified A whose before-image is x.

Undo Logging (2/3) Read P, x x -= x* 0.1 Write x,P Read S, y y = “CHEAP” Write y, S FLUSH LOG Output P Output S <START T> U1: <T, X, v> should be flushed before Output X U2: <COMMIT T> should be flushed after all OUTPUTs <T, P, x> <T, S, y> <COMMIT T>

Undo Logging (3/3) Recovery with Undo log If T has a <COMMIT T> entry, do nothing If T has a <START T> entry, but no <COMMIT T> T is incomplete and needs to be undone Restore old values from <T,X,v> records There may be multiple transactions Start scanning from the end of the log

Redo Logging (1/3) All incomplete transactions can be ignored Redo all completed transactions <T, A, x>: Transaction T modified A whose after-image is x.

Redo Logging (2/3) Read P, x x -= x* 0.1 Write x,P Read S, y y = “CHEAP” Write y, S FLUSH LOG Output P Output S <START T> R1: <T, X, v> and <COMMIT T> should be flushed before Output X <T, P, x> <T, S, y> <COMMIT T> Write-ahead Logging

Redo Logging (3/3) Recovery with Redo Logging If T has a <COMMIT T> entry, redo T If T is incomplete, do nothing (add <ABORT T>) For multiple transactions Scan from the beginning of the log

Undo/Redo Logging (1/3) Undo logging: Cannot COMMIT T unless all updates are written to disk Redo logging: Cannot release memory unless transaction commits Undo/Redo logs attempt to strike a balance

Undo/Redo Logging (2/3) Read P, x x -= x* 0.1 Write x,P Read S, y y = “CHEAP” Write y, S FLUSH LOG Output P Output S <START T> UR1: <T, X, a, b> should be flushed before Output X U1: <T, X, v> should be flushed before Output X U2: <COMMIT T> should be flushed after all OUTPUTs <T, P, x, a> <T, S, y, b> R1: <T, X, v> and <COMMIT T> should be flushed before Output X <COMMIT T>

Undo/Redo Logging (3/3) Recovery with Undo/Redo Logging Redo all committed transactions (earliest-first) Undo all uncommitted transactions (latest-first) What happens if there is a crash when you are writing a log? What happens if there is a crash during recovery?

Checkpointing Logs can be huge…can we throw away portions of it? Can we avoid processing all of it when there is a crash? ON YOUR OWN