CS514: Intermediate Course in Operating Systems Professor Ken Birman Ben Atkin: TA Lecture 8: Sept. 19.

Slides:



Advertisements
Similar presentations
Reliable Distributed Systems Transactions in Large, Complex Systems and Web Services.
Advertisements

TRANSACTION PROCESSING SYSTEM ROHIT KHOKHER. TRANSACTION RECOVERY TRANSACTION RECOVERY TRANSACTION STATES SERIALIZABILITY CONFLICT SERIALIZABILITY VIEW.
Lock-Based Concurrency Control
Replication and Consistency (2). Reference r Replication in the Harp File System, Barbara Liskov, Sanjay Ghemawat, Robert Gruber, Paul Johnson, Liuba.
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture X: Transactions.
COS 461 Fall 1997 Transaction Processing u normal systems lose their state when they crash u many applications need better behavior u today’s topic: how.
Quick Review of Apr 29 material
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
CMPT Dr. Alexandra Fedorova Lecture X: Transactions.
Ken Birman Cornell University. CS5410 Fall
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Reliable Distributed Systems Fault Tolerance (Recoverability  High Availability)
Transaction Management and Concurrency Control
CS514: Intermediate Course in Operating Systems Professor Ken Birman Ben Atkin: TA Lecture 9: Sept. 21.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
1 ICS 214B: Transaction Processing and Distributed Data Management Replication Techniques.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Distributed Transactions Transaction may access data at several sites. Each site has a local.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Ben Atkin: TA Lecture 7: Sept. 14.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
Lecture 8 Epidemic communication, Server implementation.
CS 603 Data Replication February 25, Data Replication: Why? Fault Tolerance –Hot backup –Catastrophic failure Performance –Parallelism –Decreased.
Transaction. A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the.
Transactions and concurrency control
CS5412: TRANSACTIONS (I) Ken Birman CS5412 Spring 2012 (Cloud Computing: Birman) 1 Lecture XVII.
Academic Year 2014 Spring Academic Year 2014 Spring.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Reliability Andy Jensen Sandy Cabadas.  Understanding Reliability and its issues can help one solve them in relatable areas of computing Thesis.
Distributed Transactions March 15, Transactions What is a Distributed Transaction?  A transaction that involves more than one server  Network.
Transaction Communications Yi Sun. Outline Transaction ACID Property Distributed transaction Two phase commit protocol Nested transaction.
ITEC 3220M Using and Designing Database Systems Instructor: Prof. Z. Yang Course Website: 3220m.htm
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
CS5412: TRANSACTIONS (I) Ken Birman CS5412 Spring 2015 (Cloud Computing: Birman) 1 Lecture XVI.
Lecture 13 Advanced Transaction Models. 2 Protocols considered so far are suitable for types of transactions that arise in traditional business applications,
The Relational Model1 Transaction Processing Units of Work.
XA Transactions.
Feb 1, 2001CSCI {4,6}900: Ubiquitous Computing1 Eager Replication and mobile nodes Read on disconnected clients may give stale data Eager replication prohibits.
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
Caching Consistency and Concurrency Control Contact: Dingshan He
Lecture 4 Page 1 CS 111 Online Modularity and Virtualization CS 111 On-Line MS Program Operating Systems Peter Reiher.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
Multi-phase Commit Protocols1 Based on slides by Ken Birman, Cornell University.
Multidatabase Transaction Management COP5711. Multidatabase Transaction Management Outline Review - Transaction Processing Multidatabase Transaction Management.
1 Controlled concurrency Now we start looking at what kind of concurrency we should allow We first look at uncontrolled concurrency and see what happens.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Transactions on Replicated Data Steve Ko Computer Sciences and Engineering University at Buffalo.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
1 Concurrency Control. 2 Why Have Concurrent Processes? v Better transaction throughput, response time v Done via better utilization of resources: –While.
Transactions and Reliability
CS5412: Transactions (I) Lecture XVI Ken Birman
CS514: Intermediate Course in Operating Systems
Database management concepts
On transactions, and Atomic Operations
Outline Announcements Fault Tolerance.
On transactions, and Atomic Operations
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Database management concepts
Lecture 21: Replication Control
UNIVERSITAS GUNADARMA
Transactions in Distributed Systems
CS514: Intermediate Course in Operating Systems
EEC 688/788 Secure and Dependable Computing
Lecture 23: Transactional Memory
Lecture 21: Replication Control
Transactions, Properties of Transactions
Presentation transcript:

CS514: Intermediate Course in Operating Systems Professor Ken Birman Ben Atkin: TA Lecture 8: Sept. 19

Transactions Continued We saw –Transactions on a single server –2 phase locking –2 phase commit protocol Set commit to the side (much more we can do with it) Today stay focused on the model –So-called “nested” transactions –Issues of availability and fault-tolerance

Transactions on distributed objects Idea was proposed by Liskov’s Argus group Each object translates an abstract set of operations into the concrete operations that implement it Result is that object invocations may “nest”: –Library “update” operations, does –A series of file read and write operations, which do –A series of accesses to the disk device

Argus Language Structured, like Java, around objects But objects can declare “persistent” state –Stored on a disk –Each time this object is touched, state must be retrieved and perhaps updated Language features include begin, commit, abort “Can’t commit” an example of an exception Argus can throw

Argus premises Idea was that distributed computing needs distributed programming tools Languages are powerful tools Reliability inevitably an issue –Can we successfully treat the whole system as a set of O-O applications? –Can one imagine a complex networked system built using a single language? Note that in Java we don’t get similar transactional features –Do these need to be in the language?

Nested transactions In a world of objects, each object operates by –operations on other objects –primitive operations on data If objects are transactional, how can we handle “nested” transactions? Argus extends the transactional model to address this case

Nested transactions Call the traditional style of flat transaction a “top level” transaction –Argus short hand: “actions” The main program becomes the top level action Within it objects run as nested actions

Arguments for nested transactions It makes sense to treat each object invocation as a small transaction: begin when the invocation is done, and commit or abort when result is returned –Can use abort as a “tool”: try something; if it doesn’t work just do an abort to back out of it. –Turns out we can easily extend transactional model to accommodate nested transactions Liskov argues that in this approach we have a simple conceptual framework for distributed computing

Nested transactions: picture T 1 : fetch(“ken”).... set_salary(“ken”, )... commit open_file... seek... read seek... write lower level operations...

Observations Can number operations using the obvious notation –T 1, T Subtransaction commit should make results visible to the parent transaction Subtransaction abort should return to state when subtransaction (not parent) was initiated Data managers maintain a stack of data versions

Stacking rule Abstractly, when subtransaction starts, we push a new copy of each data item on top of the stack for that item When subtransaction aborts we pop the stack When subtransaction commits we pop two items and push top one back on again In practice, can implement this much more efficiently!!!

Data objects viewed as “stacks” x y z T0T0 T T 1.1 T Transaction T 0 wrote 6 into x Transaction T 1 spawned subtransactions that wrote new values for y and z

Locking rules? When subtransaction requests lock, it should be able to obtain locks held by its parent Subtransaction aborts, locks return to “prior state” Subtransaction commits, locks retained by parent... Moss has shown that this extended version of 2-phase locking guarantees serializability of nested transactions

Commit issue? Each transaction will have touched some set of data managers –Includes those touched by nested sub-actions –But not things done by sub-actions that aborted Commit transaction by running 2PC against this set

Extending two-phase commit Too costly to run a commit for each nested level Liskov suggests an “optimistic” scheme: –Only runs final top-level commit as a full- fledged protocol –If conflict arises, then “climb the tree” to a common ancestor node with regard to conflicting lock and ask if the subtransaction that holds the lock committed or aborted

Other major issues In object oriented systems, the transactional model can be unnatural: –No way for objects that run concurrently to cooperate with each other (model suggests that all communication must be through the database! “I write it, you read it”) –Externally visible actions, like “start the motor”, are hard to represent in terms of data or objects in a database Long-running transactions will tend to block out short-running ones, yet if long- running transaction is split into short ones, we lose ACID properties

Argus responses? Added –A top-action construct Application can step outside of this transaction –Supported additional ways of locking objects User can design sophisticated locking schemes –Features for launching concurrent activities in separate threads They run as separate actions

Experience with model? Some major object oriented distributed projects have successfully used transactions Seems to work only for database style applications (e.g. the separation of data from computation is natural and arises directly in the application) Seems to work only for short-running applications

Example of a poor fit Transactions don’t work well for a “directory” that has a linked list of indexed records Problem is that long-running transaction may leave records it walks past locked Technically, this is correct, but in practice it kills performance Many suggestions but they require sophistication. Even the simplest structures demand this!

Directory represented as linked list Alan Daniel Sarah

Directory represented as linked list Alan Daniel Sarah Transaction T 0 is updating the data associated with “Daniel” so no transaction can access “Sarah”

Examples of Proposed Work-Arounds Bill Weihl suggests that the list here was “over ordered”: for our purposes, perhaps we should view it as an unordered set. Then we can hunt for a record, e.g. “Sarah”, even if some other record is locked Here we use the Argus top-level transaction mechanisms, but these are tricky to work with

Problems with top-level actions The top-level action was created from a transaction –It may have updated objects first –But it hasn’t committed yet Which versions of those objects should the top-level action see? –Argus: it sees the updated versions –If the transaction that launched it aborts, the top-level action becomes an “orphan” In some sense it finds an inconsistent world!

Problems with top-level actions Top-level actions can also leak information about the non- committed transaction to the outside world And it may be desired that we coordinate the commit of the resulting set of actions, but Argus has no way to do this

Difficulty with Work- Arounds? Even very simple data structures must be designed by experts. Poorly designed structures will not support adequate concurrency. Implication is that data structures live extracted in libraries, and have complex implementations. But real programmers often must customize data structures and would have problems with this constraint

What About nested / distributed cases Both are very costly to support: factor of 10 to 100 performance loss relative to non-distributed non- nested transactional model Many developers now seek support for these features when they purchase products Few actually use these features except in very rare situations!

Experience in real world? Argus and Clouds proposed transactions on general purpose Corba-style objects Neither was picked up by industry; costs too high Encina proposed transactional environment for UNIX file I/O This became a very important OLTP product, but was adopted mostly in database-style applications

File systems that use transactions? QuickSilver (Schmuck) exploits atomic commit for recoverability, but not serializability aspects Hagmann reported positive experiences with a group-commit scheme in Xerox file system Encina extends file system to offer efficient support for transactions on file-structured database objects

Reliability and transactions Transactions are well matched to database model and recoverability goals Transactions don’t work well for non- database applications (general purpose O/S applications) or availability goals (systems that must keep running if applications fail) When building high availability systems, encounter replication issue

Types of reliability Recoverability –Server can restart without intervention in a sensible state –Transactions do give us this High availability –System remains operational during failure –Challenge is to replicate critical data needed for continued operation

Replicating a transactional server Two broad approaches –Just use distributed transactions to update multiple copies of each replicated data item We already know how to do this, with 2PC Each server has “equal status” –Somehow treat replication as a special situation Leads to a primary server approach with a “warm standby”

Replication with 2PC Our goal will be “1-copy serializability” –Defined to mean that the multi-copy system behaves indistinguishably from a single-copy system –Considerable form and theoretical work has been done on this As a practical matter –Replicate each data item –Transaction manager Reads any single copy Updates all copies

Observation Notice that transaction manager must know where the copies reside In fact there are two models –Static replication set: basically, the set is fixed, although some members may be down –Dynamic: the set changes while the system runs, but only has operational members listed within it Today stick to the static case

Replication and Availability A series of potential issues –How can we update an object during periods when one of its replicas may be inaccessible? –How can 2PC protocol be made fault-tolerant? A topic we’ll study in more depth But the bottom line is: we can’t!

Usual responses? Quorum methods: –Each replicated object has an update and a read quorum –Designed so Q u +Q r > # replicas and Q u+ Q u > # replicas –Idea is that any read or update will overlap with the last update

Quorum example X is replicated at {a,b,c,d,e} Possible values? –Q u = 1, Q r = 5 (violates Q U +Q u > 5) –Q u = 2, Q r = 4 (same issue) –Q u = 3, Q r = 3 –Q u = 4, Q r = 2 –Q u = 5, Q r = 1 (violates availability) Probably prefer Q u =4, Q r =2

Things to notice Even reading a data item requires that multiple copies be accessed! This could be much slower than normal local access performance Also, notice that we won’t know if we succeeded in reaching the update quorum until we get responses –Implies that any quorum replication scheme needs a 2PC protocol to commit

Next issue? Now we know that we can solve the availability problem for reads and updates if we have enough copies What about for 2PC? –Need to tolerate crashes before or during runs of the protocol –A well-known problem

Availability of 2PC It is easy to see that 2PC is not able to guarantee availability –Suppose that manager talks to 3 processes –And suppose 1 process and manager fail –The other 2 are “stuck” and can’t terminate the protocol

What can be done? We’ll revisit this issue soon Basically, –Can extend to a 3PC protocol that will tolerate failures if we have a reliable way to detect them –But network problems can be indistinguishable from failures –Hence there is no commit protocol that can tolerate failures Anyhow, cost of 3PC is very high

Conclusion? We set out to replicate data for increased availability And concluded that –Quorum scheme works for updates –But commit is required –And represents a vulnerability Other options?

Other options We mentioned primary-backup schemes These are a second way to solve the problem Based on the log at the data manager

Server replication Suppose the primary sends the log to the backup server It replays the log and applies committed transactions to its replicated state If primary crashes, the backup soon catches up and can take over

Primary/backup primary backup Clients initially connected to primary, which keeps backup up to date. Backup tracks log log

Primary/backup primary backup Primary crashes. Backup sees the channel break, applies committed updates. But it may have missed the last few updates!

Primary/backup primary backup Clients detect the failure and reconnect to backup. But some clients may have “gone away”. Backup state could be slightly stale. New transactions might suffer from this

Issues? Under what conditions should backup take over –Revisits the consistency problem seen earlier with clients and servers –Could end up with a “split brain” Also notice that still needs 2PC to ensure that primary and backup stay in same states!

Split brain: reminder primary backup Clients initially connected to primary, which keeps backup up to date. Backup follows log log

Split brain: reminder Transient problem causes some links to break but not all. Backup thinks it is now primary, primary thinks backup is down primary backup

Split brain: reminder Some clients still connected to primary, but one has switched to backup and one is completely disconnected from both primary backup

Implication? A strict interpretation of ACID leads to conclusions that –There are no ACID replication schemes that provide high availability Most real systems solve by weakening ACID

Real systems They use primary-backup with logging But they simply omit the 2PC –Server might take over in the wrong state (may lag state of primary) –Can use hardware to reduce or eliminate split brain problem

How does hardware help? Idea is that primary and backup share a disk Hardware is configured so only one can write the disk If server takes over it grabs the “token” Token loss causes primary to shut down (if it hasn’t actually crashed)

Reconciliation This is the problem of fixing the transactions impacted by lack of 2PC Usually just a handful of transactions –They committed but backup doesn’t know because never saw commit record –Later. server recovers and we discover the problem Need to apply the missing ones Also causes cascaded rollback Worst case may require human intervention

Summary Reliability can be understood in terms of –Availability: system keeps running during a crash –Recoverability: system can recover automatically Transactions are best for latter Some systems need both sorts of mechanisms, but there are “deep” tradeoffs involved