1/2/99 1 1. Introduction CSE 593 Transaction Processing Philip A. Bernstein.

Slides:



Advertisements
Similar presentations
TRANSACTION PROCESSING SYSTEM ROHIT KHOKHER. TRANSACTION RECOVERY TRANSACTION RECOVERY TRANSACTION STATES SERIALIZABILITY CONFLICT SERIALIZABILITY VIEW.
Advertisements

Transactions (Chapter ). What is it? Transaction - a logical unit of database processing Motivation - want consistent change of state in data Transactions.
(c) Oded Shmueli Transactions Lecture 1: Introduction (Chapter 1, BHG) Modeling DB Systems.
COS 461 Fall 1997 Transaction Processing u normal systems lose their state when they crash u many applications need better behavior u today’s topic: how.
Database Concepts Lec. 5. What Is a Database? Data are unprocessed raw facts that include text, number, images, audio, and video. Information is processed.
Transaction Management Overview. Transactions Concurrent execution of user programs is essential for good DBMS performance. –Because disk accesses are.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 16 – Intro. to Transactions.
Transaction Processing Lecture ACID 2 phase commit.
CS 501: Software Engineering Fall 2000 Lecture 14 System Architecture I Data Intensive Systems.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Distributed Snapshots –Termination detection Election algorithms –Bully –Ring.
Chapter 15 Transaction Management. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Transaction basics Concurrency.
1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 15: Data Replication with Network Partitions & Transaction Processing Systems.
1 Transaction Management Overview Yanlei Diao UMass Amherst March 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Transaction Processing IS698 Min Song. 2 What is a Transaction?  When an event in the real world changes the state of the enterprise, a transaction is.
Chapter 8 : Transaction Management. u Function and importance of transactions. u Properties of transactions. u Concurrency Control – Meaning of serializability.
1 ACID Properties of Transactions Chapter Transactions Many enterprises use databases to store information about their state –e.g., Balances of.
The Architecture of Transaction Processing Systems
Transactions Amol Deshpande CMSC424. Today Project stuff… Summer Internships 
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
1 Recap Database: –collection of data central to some enterprise that is managed by a Database Management System –reflection of the current state of the.
Transaction Management WXES 2103 Database. Content What is transaction Transaction properties Transaction management with SQL Transaction log DBMS Transaction.
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 8A Transaction Concept.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
TRANSACTION PROCESSING TECHNIQUES BY SON NGUYEN VIJAY RAO.
Quick Review of Apr 24 material Sorting (Sections 13.4) Sort-merge Algorithm for external sorting Join Operation implementations (sect. 13.5) –Size estimation.
Transaction. A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the.
4/25/ Application Server Issues for the Project CSEP 545 Transaction Processing for E-Commerce Philip A. Bernstein Copyright ©2003 Philip A. Bernstein.
Distributed Databases
Transactions and Recovery
INTRODUCTION TO TRANSACTION PROCESSING CHAPTER 21 (6/E) CHAPTER 17 (5/E)
Computer System Lifecycle Chapter 1. Introduction Computer System users, administrators, and designers are all interested in performance evaluation. Whether.
Advanced Database Technologies Lecture 6: Transactions and Database Recovery.
Introduction. 
Client Server Technologies Middleware Technologies Ganesh Panchanathan Alex Verstak.
DB Transactions CS143 Notes TRANSACTION: A sequence of SQL statements that are executed "together" as one unit:
Transactions. 421B: Database Systems - Transactions 2 Transaction Processing q Most of the information systems in businesses are transaction based (databases.
02/23/2005Yan Huang - CSCI5330 Database Implementation – Transaction Transaction.
Chapter 15 Recovery. Topics in this Chapter Transactions Transaction Recovery System Recovery Media Recovery Two-Phase Commit SQL Facilities.
Ch 10: Transaction Management and Concurrent Control.
Concurrency Control in Database Operating Systems.
5/22/ Business Process Management CSEP 545 Transaction Processing Philip A. Bernstein Copyright ©2007 Philip A. Bernstein.
Chapter 1 Introduction to Databases. 1-2 Chapter Outline   Common uses of database systems   Meaning of basic terms   Database Applications  
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
©Silberschatz, Korth and Sudarshan15.1Database System Concepts Chapter 15: Transactions Transaction Concept Transaction State Implementation of Atomicity.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
1 Lecture 3: Transactions and Recovery Transactions (ACID) Recovery Advanced Databases CG096 Nick Rossiter [Emma-Jane Phillips-Tait]
Transactions. Transaction: Informal Definition A transaction is a piece of code that accesses a shared database such that each transaction accesses shared.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
©Silberschatz, Korth and Sudarshan14.1Database System Concepts - 6 th Edition Chapter 14: Transactions Transaction Concept Transaction State Concurrent.
Transaction Management Overview. Transactions Concurrent execution of user programs is essential for good DBMS performance. – Because disk accesses are.
Transactions.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 14: Transactions.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Multidatabase Transaction Management COP5711. Multidatabase Transaction Management Outline Review - Transaction Processing Multidatabase Transaction Management.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 16 – Intro. to Transactions.
18 September 2008CIS 340 # 1 Last Covered (almost)(almost) Variety of middleware mechanisms Gain? Enable n-tier architectures while not necessarily using.
OPERATING SYSTEM OVERVIEW. Contents   O.S.Functions   The Evolution of O.S.   Characteristics of O.S.   Basic hardware elements.
SYSTEMS IMPLEMENTATION TECHNIQUES TRANSACTION PROCESSING DATABASE RECOVERY DATABASE SECURITY CONCURRENCY CONTROL.
Transaction processing systems
Transactions.
On-Line Transaction Processing
Transactions Properties.
Outline Introduction Background Distributed DBMS Architecture
Lecture 21: Replication Control
Terms: Data: Database: Database Management System: INTRODUCTION
Presentation transcript:

1/2/ Introduction CSE 593 Transaction Processing Philip A. Bernstein

1/2/99 2 Outline 1. The Basics 2. ACID Properties 3. Atomicity and Two-Phase Commit 4. Availability 5. Performance 6. Styles of System

1/2/ The Basics - What’s a Transaction? The execution of a program that performs an administrative function by accessing a shared database, usually on behalf of an on-line user. Examples Reserve an airline seat. Buy an airline ticket Withdraw money from an ATM. Verify a credit card sale. Place an order using an on-line catalog on the Internet Fire a missile Download a video clip

1/2/99 4 What Makes Transaction Processing (TP) Hard? Reliability - system should rarely fail Availability - system must be up all the time Response time - within 1-2 seconds Throughput - thousands of transactions/second Scalability - start small, ramp up to Internet-scale Configurability - for above requirements + low cost Atomicity - no partial results Durability - a transaction is a legal contract Distribution - of users and data

1/2/99 5 What Makes TP Important? Most medium-to-large businesses use TP for their production systems. The business can’t operate without it. It’s a huge slice of the computer system market — over $50B/year. Probably the single largest application of computers. It’s the glue that enables Internet-based commerce

1/2/99 6 TP System Infrastructure From the user’s viewpoint, it –Gets a request from a display or other device –Performs some application-specific work, which includes database accesses –Usually, returns a reply It ensures each transaction is an independent unit of work that executes exactly once and produces permanent results. Makes it easy to write transactions

1/2/99 7 TP System Infrastructure … Defines System and Application Structure Presentation Manager Workflow Control (routes requests) Database System Front-End (Client) Back-End (Server) End-User Transaction Program requests

1/2/99 8 System Characteristics Typically < 100 transaction types per application Transaction size has high variance. Typically, –0-30 disk accesses –10K - 1M instructions executed –2-20 messages A large-scale example: airline reservations –150,000 active display devices –thousands of disk drives –3000 transactions per second, peak

1/2/99 9 TP Monitors A software product to create, execute and manage TP applications Takes an application written to process a single request and scales it up to a large, distributed system –E.g. application developer writes programs to debit a checking account and verify a credit card purchase. –TP monitor helps system engineer deploy it to 10s/100s of servers and 10Ks of displays

1/2/99 10 TP Monitors (cont’d) Includes an application programming interface (API), and tools for program development and system management

1/2/99 11 Presentation Server Workflow Controller Transaction Server Network Requests Message Inputs TP Monitor Architecture Boxes below can be distributed around a network Queues

1/2/99 12 Automated Teller Machine (ATM) Application Example Workflow Controller CIRRUS Accounts Credit Card Accounts Loan Accounts Workflow Controller ATM Bank Branch 1Bank Branch 2 Bank Branch 500 Checking Accounts

1/2/99 13 Automated Stock Exchange Stock Dealer 1 Stock Exchange 1 Stock Exchange 2 Stock Exchange 10 Securities 1-20 Securities Stock Dealer Workflow Controller Workflow Controller

1/2/99 14 Outline 1. The Basics 2. ACID Properties 3. Atomicity and Two-Phase Commit 4. Availability 5. Performance 6. Styles of System

1/2/ The ACID Properties Transactions have 4 main properties –Atomicity - all or nothing –Consistency - preserve database integrity –Isolation - execute as if they were run alone –Durability - results aren’t lost by a failure

1/2/99 16 Atomicity All-or-nothing, no partial results. –E.g. in a money transfer, debit one account, credit the other. Either debit and credit both run, or neither runs. –Successful completion is called Commit. –Transaction failure is called Abort. Commit and abort are irrevocable actions. An Abort undoes operations that already executed –For database operations, restore the data’s previous value from before the transaction –But some real world operations are not undoable. Examples - transfer money, print ticket, fire missile

1/2/99 17 Example - ATM Dispenses Money a non-undoable operation T1: Start... Commit Dispense Money System crashes Transaction aborts Money is dispensed T1: Start... Dispense Money Commit Deferred operation never gets executed

1/2/99 18 Reading Uncommitted Output Isn’t Undoable T1: Start... Display output... If error, Abort T2: Start Get input from display... Commit User reads output … User enters input Brain transport

1/2/99 19 Compensating Transactions A transaction that reverses the effect of another transaction (that committed). For example, –“Adjustment” in a financial system –Annul a marriage Not all transactions have complete compensations –E.g. Certain money transfers (cf. “The Firm”) –E.g. Fire missile, cancel contract –Contract law has a lot to say about appropriate compensations A well-designed TP application should have a compensation for every transaction type

1/2/99 20 Consistency Every transaction should maintain DB consistency –Referential integrity - E.g. each order references an existing customer number and existing part numbers –The books balance (debits = credits, assets = liabilities) Consistency preservation is a property of a transaction, not of the TP system (unlike the A, I, and D of ACID) If each transaction maintains consistency, then serial executions of transactions do too.

1/2/99 21 Some Notation r i [x] = Read(x) by transaction T i w i [x] = Write(x) by transaction T i c i = Commit by transaction T i a i = Abort by transaction T i A history is a sequence of such operations, in the order that the database system processed them.

1/2/99 22 Consistency Preservation Example T 1 : Start; A = Read(x); A = A - 1; Write(y, A); Commit; T 2 : Start; B = Read(x); C = Read(y); If (B > C+1) then B = B - 1; Write(x, B); Commit; Consistency predicate is x > y. Serial executions preserve consistency. Interleaved executions may not. H = r 1 [x] r 2 [x] r 2 [y] w 2 [x] w 1 [y] –e.g. try it with x=4 and y=2 initially

1/2/99 23 Isolation Intuitively, the effect of a set of transactions should be the same as if they ran independently Formally, an interleaved execution of transactions is serializable if its effect is equivalent to a serial one. Implies a user view where the system runs each user’s transaction stand-alone. Of course, transactions in fact run with lots of concurrency, to use device parallelism.

1/2/99 24 A Serializability Example T 1 : Start; A = Read(x); A = A + 1; Write(x, A); Commit; T 2 : Start; B = Read(x); B = B + 1; Write(y, B); Commit; H = r 1 [x] r 2 [x] w 1 [x] c 1 w 2 [y] c 2 H is equivalent to executing T 2 followed by T 1 Note, H is not equivalent to T 1 followed by T 2 Also, note that T 1 started and finished before T 2, yet the effect is that T 2 ran first.

1/2/99 25 Serializability Examples (cont’d) Client must control the relative order of transactions, using handshakes (wait for T 1 to commit before submitting T 2 ). Some more serializable executions: r 1 [x] r 2 [y] w 2 [y] w 1 [x]  T 1 T 2  T 2 T 1 r 1 [y] r 2 [y] w 2 [y] w 1 [x]  T 1 T 2  T 2 T 1 r 1 [x] r 2 [y] w 2 [y] w 1 [y]  T 2 T 1  T 1 T 2 Serializability says the execution is equivalent to some serial order, not necessarily to all serial orders

1/2/99 26 Non-Serializable Examples r 1 [x] r 2 [x] w 2 [x] w 1 [x] (race condition) –e.g. T 1 and T 2 are each adding 100 to x r 1 [x] r 2 [y] w 2 [x] w 1 [y] –e.g. each transaction is trying to make x = y, but the interleaved effect is a swap r 1 [x] r 1 [y] w 1 [x] r 2 [x] r 2 [y] c 2 w 1 [y] c 1 (inconsistent retrieval) –e.g. T 1 is moving $100 from x to y. –T 2 sees only half of the result of T 1 Compare to OS view of synchronization

1/2/99 27 Durability When a transaction commits, its results will survive failures (e.g. of the application, OS, DB system … even of the disk). Makes it possible for a transaction to be a legal contract. Implementation is usually via a log –DB system writes all transaction updates to its log –to commit, it adds a record “commit(T i )” to the log –when the commit record is on disk, the transaction is committed. –system waits for disk ack before acking to user

1/2/99 28 Outline 1. The Basics 2. ACID Properties 3. Atomicity and Two-Phase Commit 4. Availability 5. Performance 6. Styles of System

1/2/ Atomicity and Two-Phase Commit Distributed systems make atomicity harder Suppose a transaction updates data managed by two DB systems. One DB system could commit the transaction, but a failure could prevent the other system from committing. The solution is the two-phase commit protocol. Abstract “DB system” by resource manager (could be a message mgr, queue mgr, etc.)

1/2/99 30 Two-Phase Commit Main idea - all resource managers (RMs) save a durable copy of the transaction’s updates before any of them commit. If one resource manager fails after another commits, the failed system can still commit after it recovers. The protocol to commit transaction T –Phase 1 - T’s coordinator asks all participant RMs to “prepare the transaction”. Participant RMs replies “prepared” after the T’s updates are durable. –Phase 2 - After receiving “prepared” from all participant RMs, the coordinator tells all participant RMs to commit.

1/2/99 31 Two-Phase Commit System Architecture Resource Manager Transaction Manager Application Program Other Transaction Managers 1. Start transaction (txn) returns a unique transaction identifier 2. Resource accesses include the transaction identifier. For each txn, resource manager registers with txn manager 3. When application asks transaction manager to commit, the transaction manager runs two-phase commit.

1/2/99 32 Outline 1. The Basics 2. ACID Properties 3. Atomicity and Two-Phase Commit 4. Availability 5. Performance 6. Styles of System

1/2/ Availability Fraction of time system is able to do useful work Some systems are very sensitive to downtime –airline reservation, stock exchange, telephone switching Contributing factors –failures due to environment, system mgmt, h/w, s/w –recovery time DowntimeAvailability 1 hour/day95.8% 1 hour/week99.41% 1 hour/month99.86% 1 hour/year % 1 hour/20years %

1/2/ Performance Requirements Measured in max transaction per second (tps) or per minute (tpm), and dollars per tps or tpm. TP Performance Council (TPC) sets standards – Standards TPC A & B (‘89-’95), now TPC-C TPC A & B models a bank teller system. –Obsolete (a retired standard), but interesting –TPC-A includes terminals and comm. TPC-B is DB only –input is 100 byte message requesting deposit/withdrawal –Database tables = {Accounts, Tellers, Branches, History}

1/2/99 35 TPC-A/B Transaction Program Start Read message from terminal (100 bytes) Read+write account record (random access) Write history record (sequential access) Read+write teller record (random access) Read+write branch record (random access) Write message to terminal (200 bytes) Commit Database (DB) has 100K accounts / tps Ô Accounts table must be indexed End of history and branch records are bottlenecks

1/2/99 36 TPC-A/B numbers TPC-A/B had a huge effect on products. Drove down transaction resource consumption to –less than 100K instructions –2.1 forced disk I/Os per transaction –2 display I/Os per transaction Dollars measured by list purchase price plus 5 year vendor maintenance (“cost of ownership”) Workload has this profile: –10% TP monitor plus application –30% communications system (not counting presentation) –50% DB system

1/2/99 37 The TPC-C Order-Entry Benchmark TPC-C uses heavier weight transactions

1/2/99 38 TPC-C Transactions New-Order –Get records describing given warehse, customer, district –Update the district –Increment next available order number –Insert record in Order and New-Order tables –For 5-15 items, get Item record, get/update Stock record –Insert Order-Line Record Payment, Order-Status, Delivery, Stock-Level have similar complexity, with different frequencies tpmC = number of New-Order transaction per min.

1/2/99 39 Comments on TPC-C Enables apples-to-apples comparison of TP systems Does not predict how your application will run, or how much hardware you will need, or even which system will work best on your workload Not all vendors optimize for TPC-C. E.g., IBM has claimed DB2 is optimized for a different workload, so they have only recently published TPC numbers UI is typically HTML, for Microsoft at least.

1/2/99 40 Typical TPC-C Numbers $33 - $200 / tpmC. Uniform spread across the range. –All $33-50 results on MS SQL Server. –Some $50-60 on Sybase. No one else is under $85. System cost $153K (Intergraph) - $7M (Sun) Throughput K tpmC –Sun 52K tpmC, $7M, $135/tpmC (Oracle 8, Tuxedo 4.2) Typical low-end - Compaq ProLiant –3.9K tpmC, $206K, $53/tpmC –10.5K tpmC, $351K, $34/tpmc –SQL Server 6.5 and BEA Systems Tuxedo Results are very sensitive to date published.

1/2/99 41 Outline 1. The Basics 2. ACID Properties 3. Atomicity and Two-Phase Commit 4. Availability 5. Performance 6. Styles of System

1/2/ TP is a Style of System Batch processing - You submit a job, and receive output as a file. Time sharing - You invoke programs in a process, which may interact with the process’s display Real time - You submit requests (very short jobs) which must be processed by a deadline Client/server - PC client calls a server over a network to do work (access files, run applications) Decision support - You submit queries to a shared database, and process the result with desktop tools TP - You submit a request to run a transaction

1/2/99 43 TP vs. Batch Processing (BP) A BP application is usually uniprogrammed (one transaction at a time), so serializability is trivial. TP is multiprogrammed with much concurrency. BP performance is measured by throughput. TP is also measured by response time. BP can optimize by sorting transactions by the file key, so it can merge the transactions with the file. TP must handle random transaction arrivals. BP produces new output file. To recover from a failure, just re-run the application. BP has fixed and predictable load, unlike TP.

1/2/99 44 TP vs. Batch Processing (cont’d) But, where there is TP, there is almost always BP too. –TP gathers the input –BP post-processes work that has weak response time requirements So, TP systems must also do BP well.

1/2/99 45 TP vs. Timesharing (TS) TS is a utility with highly unpredictable load. Different programs run each day, exercising features in new combinations. By comparison, TP is highly regular. TS has less stringent availability and atomicity requirements. Downtime isn’t as expensive.

1/2/99 46 TP vs. Real Time (RT) RT has more stringent response time requirements. It may control a physical process. RT deals with more specialized devices. RT doesn’t need or use a transaction abstraction –usually loose about atomicity and serializability RT has less stringent correctness guarantees. It’s usually tolerable to lose some requests, to help meet response time goals.

1/2/99 47 TP and Client/Server (C/S) Is commonly used for TP, where client prepares requests and server runs transactions In a sense, TP systems were the first C/S systems, where the client was a terminal

1/2/99 48 TP and Decision Support Systems (DSSs) A.k.a. data warehouse (DSS is the more generic term, i.e. DW is a kind of DSS.) TP systems provide the raw data for DSSs TP runs short updates with high data integrity requirements. DSSs run long queries, usually with lower data integrity requirements

1/2/99 49 Outline 1. The Basics 2. ACID Properties 3. Atomicity and Two-Phase Commit 4. Availability 5. Performance 6. Styles of System

1/2/99 50 What’s Next? This section covered TP system structure and properties of transactions and TP systems The rest of the course drills deeply into each of these areas, one by one.