Transaction Management

Transaction Management
Dale-Marie Wilson, Ph.D.

Transaction Support Logical unit of work
Action, or series of actions, carried out by user or application, which reads or updates contents of database. Logical unit of work Application program - series of transactions with non-database processing in between Transforms database from one consistent state to another Consistency may be violated during transaction

Transaction Example

Transaction Support Two possible outcomes:
Success - transaction commits and database reaches new consistent state Failure - transaction aborts, and database restored to consistent state before transaction started Referred to as rolled back or undone transaction Committed transaction cannot be aborted Aborted transaction that is rolled back can be restarted later

State Transition Diagram for Transaction

Properties of Transaction
Basic (ACID) properties: Atomicity ‘All or nothing’ property – transaction is an indivisible unit performed in entirety or not at all Consistency Must transform database from one consistent state to another Isolation Partial effects of incomplete transactions should not be visible to other transactions Durability Effects of a committed transaction are permanent and must not be lost because of later failure

DBMS Transaction Subsystem

Concurrency Control Process of managing simultaneous operations on the database without having them interfere with one another Prevents interference when two or more users access database simultaneously and at least one updates data Interleaving of operations may produce incorrect results

Need for Concurrency Control
Potential concurrency problems: Lost update problem Uncommitted dependency problem Inconsistent analysis problem

Lost Update Problem Successfully completed update overridden by another user Example: T1 withdrawing £10 from an account with balx, initially £100 T2 depositing £100 into same account Serially, final balance would be £190

Lost Update Problem Loss of T2’s update avoided by preventing T1 from reading balx until after update

Uncommitted Dependency Problem
Occurs when one transaction can see intermediate results of another transaction before it has committed Example: T4 updates balx to £200 but it aborts, so balx should be back at original value of £100 T3 has read new value of balx (£200) and uses value as basis of £10 reduction, giving a new balance of £190, instead of £90

Uncommitted Dependency Problem
Problem avoided by preventing T3 from reading balx until after T4 commits or aborts

Inconsistent Analysis Problem
Occurs when transaction reads several values but 2nd transaction updates some of them during execution of first Aka dirty read or unrepeatable read Example: T6 is totaling balances of account x (£100), account y (£50), and account z (£25). Meantime, T5 has transferred £10 from balx to balz, so T6 now has wrong result (£10 too high).

Inconsistent Analysis Problem
Problem avoided by preventing T6 from reading balx and balz until after T5 completed updates

Serializability Objective of concurrency control protocol
To schedule transactions to avoid interference Transactions could be executed serially Limits degree of concurrency or parallelism in system Serializability identifies those executions of transactions guaranteed to ensure consistency

Serializability Schedule Sequence of reads/writes by set of concurrent transactions Serial Schedule Schedule where operations of each transaction are executed consecutively without any interleaved operations from other transactions No guarantee that results of all serial executions of a given set of transactions will be identical

Nonserial schedule Schedule where operations from set of concurrent transactions are interleaved Objective of serializability To find nonserial schedules that allow transactions to execute concurrently without interfering with one another Objective - to find nonserial schedules that are equivalent to some serial schedule Called serializable

Serializability In serializability, ordering of read/writes is important: If two transactions only read data item No conflict and order not important If two transactions either read or write completely separate data items (c) If one transaction writes a data item and another reads or writes same data item Order of execution important

Conflict Serializability Example

Serializability Conflict serializable schedule orders conflicting operations to achieve serial execution Under constrained write rule (transaction updates data item based on its old value, which is first read) Precedence graph used to test for serializability

Precedence Graph Create: If precedence graph contains cycle
node for each transaction directed edge Ti  Tj, if Tj reads value of item written by TI directed edge Ti  Tj, if Tj writes value into item after read by Ti If precedence graph contains cycle Not conflict serializable

Non-conflict Serializable Schedule Example
T9 is transferring £100 from one account with balance balx to another account with balance baly T10 is increasing balance of these two accounts by 10% Precedence graph has a cycle and so is not serializable

Non-conflict Serializable Schedule Example

Recoverability Serializability identifies schedules that maintain database consistency Assumes no failed transactions If transaction fails Atomicity requires effects of transaction to be undone Durability states that once transaction commits, its changes cannot be undone (without running another, compensating, transaction)

Recoverable Schedule A schedule where, for each pair of transactions Ti and Tj, if Tj reads a data item previously written by Ti, then the commit operation of Ti precedes the commit operation of Tj

Concurrency Control Techniques
Conservative approaches Delay transactions in case they conflict with other transactions Optimistic methods Assume conflict rare and only check for conflicts at commit Two conservative concurrency control techniques: Locking Timestamping

Locking Transaction uses locks to deny access to other transactions and so prevent incorrect updates Most widely used approach to ensure serializability Transaction claims shared (read) or exclusive (write) lock on data item before read or write Lock prevents another transaction from modifying item or even reading it, in the case of a write lock

Basic Locking Rules If transaction has shared lock on item
Can read; not update item If transaction has exclusive lock on item Can read and update item More than one transaction can hold shared locks simultaneously on same item Reads cannot conflict Exclusive lock Gives transaction exclusive access to that item Some systems allow transaction to: Upgrade read lock to exclusive lock Downgrade exclusive lock to shared lock

Incorrect Locking Schedule
For two transactions above, a valid schedule using these rules is: S = {write_lock(T9, balx), read(T9, balx), write(T9, balx), unlock(T9, balx), write_lock(T10, balx), read(T10, balx), write(T10, balx), unlock(T10, balx), write_lock(T10, baly), read(T10, baly), write(T10, baly), unlock(T10, baly), commit(T10), write_lock(T9, baly), read(T9, baly), write(T9, baly), unlock(T9, baly), commit(T9) }

Incorrect Locking Schedule
If at start, balx = 100, baly = 400, result should be: balx = 220, baly = 330, if T9 executes before T10, or balx = 210, baly = 340, if T10 executes before T9. However, result gives balx = 220 and baly = 340 S is not a serializable schedule Problem: Transactions release locks too soon => results in loss of total isolation and atomicity To guarantee serializability Additional protocol concerning positioning of lock and unlock operations in every transaction needed

Two-Phase Locking (2PL)
Transaction follows 2PL protocol if all locking operations precede first unlock operation in transaction Two phases for transaction: Growing phase - acquires all locks but cannot release any locks Shrinking phase - releases locks but cannot acquire any new locks

Preventing Lost Update Problem using 2PL

Preventing Uncommitted Dependency Problem using 2PL

Preventing Inconsistent Analysis Problem using 2PL

Cascading Rollback If every transaction in schedule follows 2PL,
Schedule is serializable Problems occur with interpretation of when locks can be released

Cascading Rollback

Cascading Rollback Transactions conform to 2PL T14 aborts
Since T15 dependent on T14, T15 must also be rolled back Since T16 dependent on T15, T16 must be rolled back To prevent this with 2PL Leave release of all locks until end of transaction

Concurrency Control with Index Structures
Each page of index treated as data item and 2PL applied Problem: Indexes frequently accessed => leads to high lock contention Index traversal: Search path starts from root and moves down to leaf nodes Search never moves back up tree Once lower-level node accessed => higher-level nodes in path will not be used again Insertion of new index value (key and pointer): If node not full, insertion will not cause changes to higher-level nodes

Concurrency Control with Index Structures
Exclusively lock leaf node If insertion to non full node Exclusively lock higher-level nodes If node full and must split Locking strategy: For searches, obtain shared locks on nodes starting at root and proceeding downwards along required path. Release lock on node once lock has been obtained on the child node For insertions, obtain exclusive locks on all nodes as descend tree to leaf node to be modified (conservative approach) Obtain shared locks on all nodes as descend to leaf node to be modified, where obtain exclusive lock. If leaf node has to split, upgrade shared lock on parent to exclusive lock. If this node also has to split, continue to upgrade locks at next higher level (optimistic approach)

Deadlock An impasse that may result when two (or more) transactions are each waiting for locks held by the other to be released

Deadlock To break deadlock: abort one or more of transactions
Deadlock transparent to user DBMS restarts transaction(s) Three techniques for handling deadlock: Timeouts Deadlock prevention Deadlock detection and recovery

Timeouts Transaction that requests lock will only wait for system-defined period of time If lock not been granted within period Lock request times out DBMS assumes transaction deadlocked => aborts and automatically restarts the transaction

Deadlock Prevention DBMS looks ahead to see if transaction would cause deadlock and never allows deadlock to occur Can order transactions using transaction timestamps: Wait-Die - only older transaction can wait for younger one, otherwise transaction aborted (dies) and restarted with same timestamp Wound-Wait - only younger transaction can wait for older one. If older transaction requests lock held by younger one, younger one aborted (wounded)

Deadlock Detection and Recovery
DBMS allows deadlock to occur; recognizes it and breaks it Handled by construction of wait-for graph (WFG) showing transaction dependencies: Create a node for each transaction Create edge Ti -> Tj, if Ti waiting to lock item locked by Tj Deadlock exists iff WFG contains cycle WFG created at regular intervals

Recovery from Deadlock Detection
Several issues: Choice of deadlock victim How far to roll transaction back Avoiding starvation – same transaction always deadlock victim

Timestamping Transactions ordered globally:
Older transactions, transactions with smaller timestamps, get priority in event of conflict Conflict resolved by rolling back and restarting transaction No locks => no deadlock

Timestamping Generated: Timestamp
A unique identifier created by DBMS that indicates relative starting time of a transaction Generated: Using system clock at start of transaction Incrementing logical counter every time new transaction starts

Timestamping Read/write proceeds only if last update on that data item carried out by older transaction Else, transaction requesting read/write restarted and given new timestamp Timestamps for data items: read-timestamp - timestamp of last transaction to read item write-timestamp - timestamp of last transaction to write item

Timestamping - Read(x)
Consider transaction T with timestamp ts(T): ts(T) < write_timestamp(x) x already updated by younger (later) transaction Any other values of T may be inconsistent Transaction must be aborted and restarted with new timestamp

Timestamping - Read(x)
ts(T) < read_timestamp(x) x already read by younger transaction Roll back transaction and restart using later timestamp

Timestamping - Write(x)
ts(T) < write_timestamp(x) x already written by younger transaction Write can safely be ignored - ignore obsolete write rule T using obsolete value Else, operation accepted and executed

Basic Timestamp Ordering

Comparison of methods

Multiversion Timestamp Ordering
Versioning of data can be used to increase concurrency Basic timestamp ordering protocol assumes one version of data item exists => only one transaction can access data item at time Allows multiple transactions to read and write different versions of same data item Ensures each transaction sees consistent set of versions for all data items it accesses

In multiversion concurrency control Each write operation creates new version of data item while retaining old version When transaction attempts to read data item System selects one version that ensures serializability

Database holds n versions of data item Each version has 3 values: Value of version xi Read_timestamp(xi) – largest timestamp of transactions that successfully read xi Write_timestamp() – timestamp of transaction that created version xi

Versions can be deleted once no longer required Find timestamp of oldest transaction For any 2 versions xi and xj of data item x with write timestamps less than oldest timestamp Older version can be deleted

Optimistic Techniques
Based on assumption that conflict rare and more efficient to let transactions proceed without delays to ensure serializability At commit, check made to determine whether conflict occurred If conflict, transaction must be rolled back and restarted Potentially allows greater concurrency than traditional protocols Three phases: Read Validation Write

Optimistic Techniques - Read Phase
Extends from start until immediately before commit Transaction reads values from database Stores them in local variables Updates applied to local copy of data

Optimistic Techniques - Validation Phase
Follows read phase For read-only transaction, checks that data read are still current values If no interference, transaction committed, else aborted and restarted For update transaction, checks transaction leaves database in consistent state, with serializability maintained

Optimistic Techniques - Write Phase
Follows successful validation phase for update transactions Updates made to local copy applied to the database

Granularity of Data Items
Size of data items chosen as unit of protection by concurrency control protocol Ranging from coarse to fine: Entire database A file A page (or area or database spaced) A record A field value of a record

Granularity of Data Items
Tradeoff: Coarser - the lower the degree of concurrency Finer - more locking information needed to be stored Best item size depends on types of transactions

Hierarchy of Granularity
Represent granularity of locks in hierarchical structure Root node represents entire database Level 1s represent files, etc When node locked All descendants also locked DBMS should check hierarchical path before granting lock

Intention lock Used to lock all ancestors of locked node Read or written Applied top-down, released bottom-up

2PL used to ensure serializability No lock granted once any node unlocked No node locked until parent locked by an intention lock No node unlocked until all descendants unlocked

Levels of Locking

Database Recovery Process of restoring database to a correct state in the event of a failure. Need for Recovery Control Two types of storage: volatile (main memory) and nonvolatile Volatile storage does not survive system crashes Stable storage represents information that has been replicated in several nonvolatile storage media with independent failure modes

Types of Failures System crashes => result in loss of main memory
Media failures => result in loss of parts of secondary storage Application software errors Natural physical disasters Carelessness or unintentional destruction of data or facilities Sabotage

Transactions and Recovery
Transactions represent basic unit of recovery Recovery manager responsible for atomicity and durability If failure occurs between commit and database buffers being flushed to secondary storage then, to ensure durability, recovery manager has to redo (rollforward) transaction’s updates

Transactions and Recovery
If transaction not committed at failure time, recovery manager undoes (rollbacks) any effects of that transaction for atomicity Partial undo - only one transaction to be undone Global undo - all transactions to be undone

Example DBMS starts at time t0, but fails at time tf. Assume data for transactions T2 and T3 have been written to secondary storage T1 and T6 have to be undone. In absence of any other information, recovery manager has to redo T2, T3, T4, and T5

Recovery Facilities DBMS should provide following facilities to assist with recovery: Backup mechanism Makes periodic backup copies of database Logging facilities Keeps track of current state of transactions and database changes Checkpoint facility Enables updates to database in progress to be made permanent Recovery manager Allows DBMS to restore database to consistent state following failure

Log File Contains information about all updates to database:
Transaction records Checkpoint records Often used for other purposes e.g. auditing Transaction records contain: Transaction identifier Type of log record, (transaction start, insert, update, delete, abort, commit) Identifier of data item affected by database action (insert, delete, and update operations) Before-image of data item After-image of data item Log management information

Sample Log File

Log File Log file may be duplexed or triplexed
Log file sometimes split into two separate random-access files Potential bottleneck Critical in determining overall performance

Checkpointing Checkpoint Point of synchronization between database and log file. All buffers are force-written to secondary storage Checkpoint record created containing identifiers of all active transactions When failure occurs, redo all transactions that committed since the checkpoint and undo all transactions active at time of crash

Checkpointing With checkpoint at time tc, changes made by T2 and T3 have been written to secondary storage Thus: only redo T4 and T5 undo transactions T1 and T6

Recovery Techniques If database damaged: If database inconsistent:
Need to restore last backup copy of database Reapply updates of committed transactions using log file If database inconsistent: Need to undo changes that caused inconsistency May need to redo some transactions to ensure updates reach secondary storage Do not need backup, but can restore database using before- and after-images in the log file Three main recovery techniques: Deferred Update Immediate Update Shadow Paging

Deferred Updates Updates not written to database until after transaction has reached its commit point If transaction fails before commit Has not modified database => no undoing of changes required May be necessary to redo updates of committed transactions as their effect may not have reached database

Immediate Update Updates applied to database as they occur
Need to redo updates of committed transactions following failure May need to undo effects of transactions that not yet committed at time of failure Essential that log records are written before write to database Write-ahead log protocol

Immediate Update If no “transaction commit” record in log
Transaction was active at failure and must be undone Undo operations performed in reverse order in which they were written to log

Shadow Paging Maintain two page tables during life of transaction
Current page table Shadow page table At transaction start Two pages the same Shadow page table never changed thereafter Used to restore database in event of failure During transaction current page table records all updates to database When transaction completes Current page table becomes shadow page table

Advanced Transaction Models
Protocols discussed suitable for types of transactions characterized by: Simple nature of data Integers, short character strings, dates, etc. Short duration of transactions More advanced database applications characterized by: Large designs Dynamic designs Changes propagated through all representations Far-reaching updates Cooperative engineering Several people working in parallel on multiple versions of design

May result in transactions of long duration Potential problems: More susceptible to failure - need to minimize amount of work lost May access large number of data items - concurrency limited if data inaccessible for long periods Deadlocks more likely Cooperation through use of shared data items restricted by traditional concurrency protocols

Look at five advanced transaction models: Nested Transaction Model Sagas Multi-level Transaction Model Dynamic Restructuring Workflow Models

Nested Transaction Model
Transaction viewed as hierarchy of subtransactions Top-level transaction can have number of child transactions Each child can have nested transactions Only leaf-level subtransactions allowed to perform database operations Transactions commit from bottom upwards Transaction abort at one level does not have to affect transaction in progress at higher level

Nested Transaction Model
Parent allowed to perform its own recovery: Retry subtransaction If subtransaction non-vital Ignore failure Run contingency subtransaction Abort Updates of committed subtransactions at intermediate levels visible only within scope of immediate parents Commit of subtransaction conditionally subject to commit or abort of its superiors. Top-level transactions conform to traditional ACID properties of flat transaction

Nested Transaction Example

Nested Transactions Advantages: Modularity
Transaction can be decomposed into number of subtransactions for purposes of concurrency and recovery Finer level of granularity for concurrency control and recovery Intra-transaction parallelism Intra-transaction recovery control

Savepoints An identifiable point in flat transaction representing some partially consistent state. Used as restart point for transaction if subsequent problem detected During execution of transaction User establishes savepoint User can use savepoint to roll transaction back to Unlike nested transactions, savepoints do not support any form of intra-transaction parallelism

Sagas “A sequence of (flat) transactions that can be interleaved with other transactions”. DBMS guarantees: All transactions in saga are successfully completed or Compensating transactions are run to undo partial execution Saga has one level of nesting For every subtransaction defined Corresponding compensating transaction that will semantically undo subtransaction’s effect

Sagas Relax property of isolation
Allows saga to reveal partial results to other concurrently executing transactions before it completes Useful when subtransactions are relatively independent and compensating transactions can be produced May be difficult sometimes to define compensating transaction in advance, and DBMS may need to interact with user to determine compensation

Dynamic Restructuring
To address constraints imposed by ACID properties of flat transactions: split_transaction join_transaction split-transaction - splits transaction into two serializable transactions and divides its actions and resources (for example, locked data items) between new transactions Resulting transactions proceed independently

Allows partial results of transaction to be shared Preserves transaction semantics Applicable when: Possible to generate two transactions that are serializable with each other and with all other concurrently executing transactions

Conditions that permit transaction to be split into A and B are: AWriteSet  BWriteSet  BWriteLast If both A and B write to same object, B’s write operations must follow A’s write operations AReadSet  BWriteSet =  A cannot see any results from B BReadSet  AWriteSet = ShareSet B may see results of A Guarantee that A is serialized before B If A aborts, B must also abort If both BWriteLast and ShareSet are empty, then A and B can be serialized in any order and both can be committed independently

join-transaction - performs reverse operation, merging ongoing work of two or more independent transactions, as though they had always been single transaction Main advantages of dynamic restructuring are: Adaptive recovery Reducing isolation

Covers Chapter 20 up to 20.5

Transaction Management

Similar presentations

Presentation on theme: "Transaction Management"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Transaction Management

Similar presentations

Presentation on theme: "Transaction Management"— Presentation transcript:

Similar presentations

About project

Feedback