Trafodion Distributed Transaction Management

Slides:



Advertisements
Similar presentations
CM20145 Concurrency Control
Advertisements

Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
Company LOGO MVCC on Flash Memory Fan Yulei, Lab of WAMDM, School of Information, Renmin University of China, Beijing, China,
Omid Efficient Transaction Management and Incremental Processing for HBase Copyright © 2013 Yahoo! All rights reserved. No reproduction or distribution.
CS6223: Distributed Systems
1 Chapter 3. Synchronization. STEMPusan National University STEM-PNU 2 Synchronization in Distributed Systems Synchronization in a single machine Same.
COS 461 Fall 1997 Transaction Processing u normal systems lose their state when they crash u many applications need better behavior u today’s topic: how.
Nested Transactional Memory: Model and Preliminary Architecture Sketches J. Eliot B. Moss Antony L. Hosking.
ICS 421 Spring 2010 Distributed Transactions Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/16/20101Lipyeow.
Concurrency Control and Recovery In real life: users access the database concurrently, and systems crash. Concurrent access to the database also improves.
Recovery 10/18/05. Implementing atomicity Note, when a transaction commits, the portion of the system implementing durability ensures the transaction’s.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Distributed Snapshots –Termination detection Election algorithms –Bully –Ring.
Persistent State Service 1 Distributed Object Transactions  Transaction principles  Concurrency control  The two-phase commit protocol  Services for.
Chapter 8 : Transaction Management. u Function and importance of transactions. u Properties of transactions. u Concurrency Control – Meaning of serializability.
Reliability and Partition Types of Failures 1.Node failure 2.Communication line of failure 3.Loss of a message (or transaction) 4.Network partition 5.Any.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
Transaction Management WXES 2103 Database. Content What is transaction Transaction properties Transaction management with SQL Transaction log DBMS Transaction.
Transaction Management Chapter 9. What is a Transaction? A logical unit of work on a database A logical unit of work on a database An entire program An.
Locking Key Ranges with Unbundled Transaction Services 1 David Lomet Microsoft Research Mohamed Mokbel University of Minnesota.
Distributed Deadlocks and Transaction Recovery.
1 Module 6 Log Manager COP Log Manager Log knows everything. It is the temporal database –The online durable data are just their current versions.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 10 Transaction Management.
Switch off your Mobiles Phones or Change Profile to Silent Mode.
Concurrency Server accesses data on behalf of client – series of operations is a transaction – transactions are atomic Several clients may invoke transactions.
Concurrency Control in Database Operating Systems.
Computer Science Lecture 13, page 1 CS677: Distributed OS Last Class: Canonical Problems Election algorithms –Bully algorithm –Ring algorithm Distributed.
XA Transactions.
EbiTrack Architecture Version 1.0 September 24, 2012.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “DATABASE RECOVERY” (PART – 2) Academic Year 2014 Spring.
1 CSE 480: Database Systems Lecture 24: Concurrency Control.
NOEA/IT - FEN: Databases/Transactions1 Transactions ACID Concurrency Control.
Multidatabase Transaction Management COP5711. Multidatabase Transaction Management Outline Review - Transaction Processing Multidatabase Transaction Management.
10 1 Chapter 10 - A Transaction Management Database Systems: Design, Implementation, and Management, Rob and Coronel.
Computer Science Lecture 13, page 1 CS677: Distributed OS Last Class: Canonical Problems Election algorithms –Bully algorithm –Ring algorithm Distributed.
Transactions in PostgreSQL
Recovery in Distributed Systems:
How did it start? • At Google • • • • Lots of semi structured data
Last Class: Canonical Problems
Database Applications (15-415) DBMS Internals- Part XIII Lecture 22, November 15, 2016 Mohammad Hammoud.
Introduction to NewSQL
Two phase commit.
MCU cluster Cristian Alexe 18 October 2010.
The Vocabulary of Performance Tuning
Predictive Performance
Database management concepts
Distributed P2P File System
Commit Protocols CS60002: Distributed Systems
Outline Announcements Fault Tolerance.
Concurrency Control II (OCC, MVCC)
Lecture 6: Transactions
Chapter 10 Transaction Management and Concurrency Control
Database Applications (15-415) DBMS Internals- Part XIII Lecture 25, April 15, 2018 Mohammad Hammoud.
Replication and Recovery in Distributed Systems
Lecture 22: Consistency Models, TM
Database management concepts
The Vocabulary of Performance Tuning
Lecture 20: Intro to Transactions & Logging II
Transactions and Concurrency
Database Recovery 1 Purpose of Database Recovery
Exercises for Chapter 14: Distributed Transactions
Distributed Databases Recovery
UNIVERSITAS GUNADARMA
Transactions in Distributed Systems
Database Applications (15-415) DBMS Internals- Part XIII Lecture 24, April 14, 2016 Mohammad Hammoud.
The Vocabulary of Performance Tuning
Concurrency control (OCC and MVCC)
Presentation transcript:

Trafodion Distributed Transaction Management November, 2014

Trafodion Distributed Transaction Management Scalable Architecture Node 1 SQL Process Transaction Manager Library Resource Manager Library Transaction Manager HBase trx Region Server HBase Region Server Node 2 SQL Process Transaction Manager Library Resource Manager Library Transaction Manager HBase trx Region Server HBase Region Server Node n SQL Process Transaction Manager Library Resource Manager Library Transaction Manager HBase trx Region Server HBase Region Server ... Implemented using HBase coprocessors

Trafodion Distributed Transaction Management Peeling the onion – Component Architecture SQL Transaction initiating process SQL JNI Transaction Manager Library Resource Manager Library TrxRegionEndpoint coprocessor client TransactionalTable TransactionalScanner Transactional operations get/put/delete begin/commit/abort(tx1 ) register(region, tx1) Transaction Manager DTM Core Branch I/F Transaction Management JNI TransactionManager TrxRegionEndpoint coprocessor client Jav a C++ transactional aggregation(tx1) Transaction Manager Log Meta info/state about transaction (more than one region in a txn) HBase table(s) TLOG Jav a get/put/ delete(tx1 ) open/perform/ close scanner(tx1) commitRequest/commit/abort(tx1) Coprocessors implemented TrxRegionEndpoint implements SQL transactions TrxRegionObserver implements recovery process HBase RegionServer HRegion HRegion HRegion HBase Write Ahead Log Region level transaction info Each region server has its own HLOG HLOG

Trafodion Distributed Transaction Management BEGIN WORK – BEGINTRANSACTION SQL Transaction initiating process 1 BEGINTRANSACT ION or BEGINTX transid is allocated by TM for distributed support Transaction Manager Library Resource Manager Library TrxRegionEndpoint coprocessor client TransactionalTable TransactionalScanner Transactional operations get/put/delete 2 begintransaction request transid 6 Transaction Manager DTM Core Branch I/F Transaction Management JNI TransactionManager TrxRegionEndpoint coprocessor client 3 new transaction object allocated Assign new transid Txn Object 4 TransactionManager. beginTransaction(transid) 5 new TransactionState Txn State HBase RegionServer HRegion HRegion HRegion

Trafodion Distributed Transaction Management get / put / delete SQL Transaction initiating process SQL Transaction initiating process Transaction Manager Library Resource Manager Library TrxRegionEndpoint coprocessor client TransactionalTable TransactionalScanner Transactional operations get/put/delete Transaction Manager Library Resource Manager Library TrxRegionEndpoint coprocessor client TransactionalTable TransactionalScanner Transactional operations get/put/delete 4 call register (region, transid) 1 put(transid, region) Transaction Manager DTM Core Branch I/F Transaction Management JNI TransactionManager TrxRegionEndpoint coprocessor client Txn Object Txn State TransactionStateObj Transaction Manager DTM Core Branch I/F Transaction Management JNI TransactionManager TrxRegionEndpoint coprocessor client Adds participant to the participating processes list of the Transaction Object 5 If region is not registered for this transaction, calls HBaseTxn.registerRegion 6 2 Look up TransactionState by transid – create object if not there 7 If region not in participatingRegions list of TransactionState add it & initiate registration 3 TransactionManager.registerRegion which adds it to the TransactionState.participatingRegions list 8 TransactionalTable.put(TransactionState, region) 9 TrxRegionEndpoint.put(transid) for region HBase RegionServer HBase RegionServer HRegion HRegion HRegion HRegion HRegion HRegion 10 TrxRegionEndpoint.beginTransaction(transid) for region executed if transaction not initiated for the region 11 Update region transaction object context to include puts, deletes

Trafodion Distributed Transaction Management COMMIT WORK – ENDTRANSACTION – Phase 1, prepare SQL Transaction initiating process SQL Transaction initiating process 1 ENDTRANSACTIO N Transaction Manager Library Resource Manager Library TrxRegionEndpoint coprocessor client TransactionalTable TransactionalScanner Transactional operations get/put/delete Transaction Manager Library Resource Manager Library TrxRegionEndpoint coprocessor client TransactionalTable TransactionalScanner Transactional operations get/put/delete 2 endtransaction Abort if conflict encountered 9 Transaction Manager DTM Core Branch I/F Transaction Management JNI TransactionManager TrxRegionEndpoint coprocessor client Txn Object Txn State Transaction Manager DTM Core Branch I/F Transaction Management JNI TransactionManager TrxRegionEndpoint coprocessor client 3 prepare_branches HBaseTxn.prepareCommit 4 TransactionManager. prepareCommit 5 TrxRegionEndpoint.commitRequest(transid) against each region in participatingRegions list, sent in parallel 8 Votes abort, read-only, or commit to transaction manager HBase RegionServer HBase RegionServer 6 Region endpoint coprocessor checks for conflicts / concurrency issue If conflict it votes abort else checks for writes in region If no writes, votes read-only & is excluded from phase 2 HRegion HRegion HRegion HRegion HRegion HRegion HLOG If writes performed, context for transid, including all updates, is forced written to HLOG HLOG 7

Trafodion Distributed Transaction Management COMMIT WORK – ENDTRANSACTION – Phase 2, commit If at least one region voted commit & the rest voted commit or read-only SQL Transaction initiating process SQL Transaction initiating process Transaction Manager Library Resource Manager Library TrxRegionEndpoint coprocessor client TransactionalTable TransactionalScanner Transactional operations get/put/delete Transaction Manager Library Resource Manager Library TrxRegionEndpoint coprocessor client TransactionalTable TransactionalScanner Transactional operations get/put/delete commit 10 Transaction Manager DTM Core Branch I/F Transaction Management JNI TransactionManager TrxRegionEndpoint coprocessor client Txn Object Txn State Transaction Manager DTM Core Branch I/F Transaction Management JNI TransactionManager TrxRegionEndpoint coprocessor client 1 commit_branches 3 commit_branches HBaseTxn.doCommit 9 doCommit forgets transaction & returns control to TM 4 TransactionManager. doCommit 2 Forces write of committed trans state record TLOG 5 TrxRegionEndpoint.commit(transid) against each region in participatingRegions list, sent in parallel Records hashed key, ASN, txn id & state, and tables in txn HBase RegionServer HBase RegionServer 8 Unforced write of forgotten trans state record Used for control point / aging process discussed later HRegion HRegion HRegion HRegion HRegion HRegion 6 Context for transid as committed txn is written unforced to HLOG HLOG 7 All puts and deletes are written to HBase tables from in-memory transaction state HLOG Reduces recovery time

Trafodion Distributed Transaction Management ABORT WORK – ABORTTRANSACTION SQL Transaction initiating process SQL Transaction initiating process Transaction Manager Library Resource Manager Library TrxRegionEndpoint coprocessor client TransactionalTable TransactionalScanner Transactional operations get/put/delete Transaction Manager Library Resource Manager Library TrxRegionEndpoint coprocessor client TransactionalTable TransactionalScanner Transactional operations get/put/delete 1 Abort Transaction Manager DTM Core Branch I/F Transaction Management JNI TransactionManager TrxRegionEndpoint coprocessor client Txn Object Txn State Transaction Manager DTM Core Branch I/F Transaction Management JNI TransactionManager TrxRegionEndpoint coprocessor client 2 rollback_branches 4 rollback_branches HBaseTxn.abort 10 TM does forget processing No writes to log 5 TransactionManager. abort 9 Abort forgets transaction & returns control to TM TLOG 3 HRegion.abort sets status of txn to abort Removes txn from commit list if previously voted commit Remove txn from pending list does not write trans record If forgotten records not written, presume abort 6 TrxRegionEndpoint.abortTransaction(transid) against each region in participatingRegions list, sent in parallel 7 HBase RegionServer HBase RegionServer HRegion HRegion HRegion HRegion HRegion HRegion HLOG HLOG 8 If writes associated with txn, unforced write of abort to HLOG Reduces recovery time

Trafodion Distributed Transaction Management Transaction Recovery – timing of failure State at failure Action at failure Failed region(s) start up Updates are in region’s memory Initiate abort processing No recovery needed Prepare: Some regions complete If region had not completed Prepare phase before failure, no recovery needed. If it had, then abort processing initiated. Prepare: All regions complete Abort processing for all regions that had updates Commit: Some regions complete Initiate commit processing Apply updates from HLOG to tables if not committed

Trafodion Distributed Transaction Management Transaction Recovery SQL Transaction initiating process SQL Transaction initiating process Transaction Manager Library Resource Manager Library TrxRegionEndpoint coprocessor client TransactionalTable TransactionalScanner Transactional operations get/put/delete Transaction Manager Library Resource Manager Library TrxRegionEndpoint coprocessor client TransactionalTable TransactionalScanner Transactional operations get/put/delete TM builds in-doubt region list and initiates recovery for those regions 4 Transaction Manager DTM Core Branch I/F Transaction Management JNI TransactionManager TrxRegionEndpoint coprocessor client Transaction Manager DTM Core Branch I/F Transaction Management JNI TransactionManager TrxRegionEndpoint coprocessor client Zookeeper Bulletin Board for TM 7 In-doubt transaction No prepare record  abort transaction Commit record  re-drive commit in redo processing Abort record  do nothing Prepare record  needs to know commit or abort? ... Update txn state obj from last state of transid, & participating regions in TLOG 8 Recovery thread re-drives phase 2 TLOG Collates list of in-doubt transactions from all regions and creates empty transaction state object for each txn 6 Txn State Table names mapped to regions Accommodates region splits & moves 5 Gather in-doubt transactions from in-doubt regions 3 Each Region with in-doubt transactions posts need for recovery to bulletin of TM that originated the transaction if not yet posted 9 Commit / Abort will be done as normal for in-doubt txns 1 HBase RegionServer 1 HBase RegionServer When failed server(s) are reintegrated, Region Server reads HLOG. If its empty, indicating clean shutdown, it resumes txn processing. Otherwise it initiates recovery for regions. HRegion HRegion HRegion HRegion HRegion HRegion HLOG 2 All HRegions build in-doubt txn list by Region observer coprocessor HLOG 10 In-doubt regions deregister from bulletin board TM disables transactions during recovery

Trafodion Distributed Transaction Management Transaction Recovery – Control Point processing Used to re-drive transaction recovery After server or cluster failure to ensure data consistency given in-flight operations at time of failure Control points written to Control Point table at a configurable 1interval, currently set at 2 minutes Re-drives of transaction commits (redo) driven by recovery needs of regions Used for aging TLOG entries prior to 2 control points can be aged out and only two entries maintained in control point table Commit records rewritten to log, along with writing txn state of all running transactions, until all commits received and forgotten record written SQL Transaction initiating process Transaction Manager Library Forced write of key & 2ASN for txn state record written to TLOG Transaction Manager DTM Core Branch I/F Transaction Management JNI TransactionManager TrxRegionEndpoint coprocessor client Txn Object CP TLOG Txn State Unforced grouped writes of txn record for each running txn 1control point interval is represented by two consecutive control point records where the first record defines the ASN from the start of the interval and the second defines the ASN at the end of the record 2monotonically increasing value indicating the audit write sequence number

Trafodion Distributed Transaction Management Addressing the requirements Current State Use HBase Endpoint coprocessors Trafodion has Transaction Manager (TM) processes, written in C++, supporting: Recovery on client / region server crash Transactions involving multiple HBase clients (transaction propagation) Clients can use a new, transactional client derived from regular HBase client Trafodion as single jar file, used by region server, TM & clients. Compatible with HBase 0.98 version. Recovery from region server Goals Optional, no penalty if not used Very low overhead – tight integration with the region server helps Avoid additional processes Avoid non-Java code Avoid version incompatibilities How to achieve goals Provide Java implementation of TM code for the following functionality, if needed: Txn propagation Manageability Recovery

Trafodion Distributed Transaction Management Future opportunities Optimizations For single region transactions Transaction flow Isolation support Repeatable read Snapshot isolation Serialized snapshot isolation Row Locking paradigm for certain tables Tables involved in long running transactions Pessimistic locking for highly concurrent operations High Availability Recovery from TM, or node failure Accommodate Region Splitting & Balancing Manageability Transaction object info/metrics via REST APIs

Thank you