Concurrent Cache-Oblivious B-trees Using Transactional Memory

Slides:



Advertisements
Similar presentations
IDA / ADIT Lecture 10: Database recovery Jose M. Peña
Advertisements

Principles of Transaction Management. Outline Transaction concepts & protocols Performance impact of concurrency control Performance tuning.
Chapter 11: File System Implementation
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
File System Implementation
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
1/11/ Atomicity & Durability Using Shadow Paging CSEP 545 Transaction Processing for E-Commerce Philip A. Bernstein Copyright ©2012 Philip A.
INTRODUCTION TO TRANSACTION PROCESSING CHAPTER 21 (6/E) CHAPTER 17 (5/E)
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Overview of a Database Management System
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Main memory DB PDT Ján GENČI. 2 Obsah Motivation DRDBMS MMDBMS DRDBMS versus MMDBMS Commit processing Support in commercial systems.
Free Space Management.
Page 111/15/2015 CSE 30341: Operating Systems Principles Chapter 11: File System Implementation  Overview  Allocation methods: Contiguous, Linked, Indexed,
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g ) Coe-502 paper presentation 2.
Transactions. Transaction: Informal Definition A transaction is a piece of code that accesses a shared database such that each transaction accesses shared.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Concurrent Cache-Oblivious B-trees Using Transactional Memory
4 November 2005 CS 838 Presentation 1 Nested Transactional Memory: Model and Preliminary Sketches J. Eliot B. Moss and Antony L. Hosking Presented by:
COMP 430 Intro. to Database Systems Transactions, concurrency, & ACID.
Database recovery techniques
CS 540 Database Management Systems
Lecture 20: Consistency Models, TM
Database Recovery Techniques
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Virtual memory.
Jonathan Walpole Computer Science Portland State University
Transactions.
File-System Implementation
Indexing Goals: Store large files Support multiple search keys
Database Management System
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Applications (15-415) DBMS Internals- Part XIII Lecture 22, November 15, 2016 Mohammad Hammoud.
Hashing - Hash Maps and Hash Functions
Hash-Based Indexes Chapter 11
Database Performance Tuning and Query Optimization
Chapter 12: File System Implementation
Transactions.
Main Memory Database Systems
Hash Table.
Lecture 19: Transactional Memories III
RAID RAID Mukesh N Tekwani
Optimizing Malloc and Free
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Hash-Based Indexes Chapter 10
Lecture 6: Transactions
Introduction to Database Systems
Database Applications (15-415) DBMS Internals- Part XIII Lecture 25, April 15, 2018 Mohammad Hammoud.
Overview Continuation from Monday (File system implementation)
Outline Allocation Free space management Memory mapped files
Overview: File system implementation (cont)
Lecture 22: Consistency Models, TM
Database Security Transactions
File Storage and Indexing
File-System Structure
Introduction of Week 13 Return assignment 11-1 and 3-1-5
H-store: A high-performance, distributed main memory transaction processing system Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alex.
Software Transactional Memory Should Not be Obstruction-Free
Transactions with Nested Parallelism
Lecture 20: Intro to Transactions & Logging II
Chapter 11 Database Performance Tuning and Query Optimization
Database Recovery 1 Purpose of Database Recovery
CENG 351 Data Management and File Structures
RAID RAID Mukesh N Tekwani April 23, 2019
Database Applications (15-415) DBMS Internals- Part XIII Lecture 24, April 14, 2016 Mohammad Hammoud.
Chapter 11 Instructor: Xin Zhang
Lecture 23: Transactional Memory
CSCI 6442 Main Memory Database
Controlled Interleaving for Transactions
Presentation transcript:

Concurrent Cache-Oblivious B-trees Using Transactional Memory Jim Sukha Bradley Kuszmaul MIT CSAIL June 10, 2006

Thought Experiment Imagine that, one day, you are assigned the following task: Enclosed is code for a serial, cache-oblivious B-tree. We want a reasonably efficient parallel implementation that works for disk-resident data. Attach: COB-tree.tar.gz PS. We want to be able to restore the data to a consistent state after a crash too. PPS. Our deadline is next week. Good luck!

Concurrent COB-tree? Question: How can one program a concurrent, cache-oblivious B-tree? Approach: We employ transactional memory. What complications does I/O introduce?

Potential Pitfalls Involving I/O Suppose our data structure resides on disk. We might need to make explicit I/O calls to transfer blocks between memory and disk. But a cache-oblivious algorithm doesn’t know the block size B! We might need buffer management code if the data doesn’t fit into main memory. We might need to unroll I/O if we abort a transaction that has already written to disk.

Our Solution: Libxac We have implemented Libxac, a page-based transactional memory system that operates on disk-resident data. Libxac supports ACID transactions on a memory-mapped file. Using Libxac, we are able to implement a complex data structure that operates on disk-resident data, e.g. a cache-oblivious B-tree.

Libxac Handles Transaction I/O We might need to make explicit I/O calls to transfer blocks between memory and disk. Similar to mmap, Libxac provides a function xMmap. Thus, we can operate on disk-resident data without knowing block size. We might need buffer management code if the data doesn’t fit into main memory. Like mmap, the OS automatically buffers pages in memory. We might need to unroll I/O if we abort a transaction that has already written to disk. Since Libxac implements multiversion concurrency control, we still have the original version of a page even if a transaction aborts.

Outline Programming with Libxac Cache-Oblivious B-trees

Example Program with Libxac int main(void) { int* x; int status = FAILURE; xInit(“/logs”, DURABLE); x = xMmap(“input.db”, 4096); while (status != SUCCESS) { xbegin(); x[0] ++; status = xend(); } xMunmap(x); xShutdown(); return 0; Runtime initialization function. For durable transactions, logs are stored in the specified directory.* * Currently Libxac logs the transaction commits, but we haven’t implemented the recovery program yet. Transactionally maps the first page of the input file. Transaction body. The body can be a complex function (e.g., a cache-oblivious B-tree insert!). Unmap the region. Shutdown runtime.

Libxac Memory Model Aborted transactions are visible to the programmer (thus, programmer must explicitly retry transaction). Control flow always proceeds from xbegin() to xend(). Thus, the xaction body can contain system/library calls. At xend(), all changes to xMmap’ed region are discarded on FAILURE, or committed on SUCCESS. Aborted transactions always see consistent state. Read-only transactions can always succeed. int main(void) { int* x; int status = FAILURE; xInit(“/logs”, DURABLE); x = xMmap(“input.db”, 4096); while (status != SUCCESS) { xbegin(); x[0] ++; status = xend(); } xMunmap(x); xShutdown(); return 0; *Libxac supports concurrent transactions on multiple processes, not threads.

Implementation Sketch Libxac detects memory accesses by using a SIGSEGV handler to catch a memory protection violation on a page that has been mmap’ed. This mechanism is slow for normal transactions: Time for mmap, SIGSEGV handler: ~ 10 ms Efficient if we must perform disk I/O to log transaction commits. Time to access disk: ~ 10 ms

Is xMmap practical? Experiment on a 4-proc. AMD Opteron, performing 100,000 insertions of elements with random keys into a B-tree. Each insert is a separate transaction. Libxac and BDB both implement group commit. B-tree and COB-tree both use Libxac. Note that none of the three data structures have been properly tuned. Conclusion: We should achieve good performance.

Outline Programming with Libxac Cache-Oblivious B-trees

What is a Cache-Oblivious B-tree? A cache-oblivious B-tree (e.g. [BDFC00]) is a dynamic dictionary data structure that supports searches, insertions/deletions, and range-queries. An cache-oblivious algorithm/data structure does not know system parameters (e.g. the block size B.) Theorem [FLPR99]: a cache-oblivious algorithm that is optimal for a two-level memory hierarchy is also optimal for a multi-level hierarchy.

Cache-Oblivious B-Tree Example Static Cache-Oblivious Tree 21 10 45 4 16 38 54 4 10 16 21 38 45 54 83 4 -- 1 -- 10 7 6 16 -- 15 13 -- 21 38 31 24 23 45 -- 40 39 -- 54 48 83 70 59 56 Packed Memory Array (PMA) The COB-tree can be divided into two pieces: A packed memory array that stores the data in order, but contains gaps. A static cache-oblivious binary-tree that indexes the packed memory array.

Cache-Oblivious B-Tree Insert Static Cache-Oblivious Tree 21 10 45 4 16 38 54 4 10 16 21 38 45 54 83 4 -- 1 -- 10 7 6 16 -- 15 13 -- 21 38 31 24 23 45 -- 40 39 -- 54 48 83 70 59 56 To insert a key of 37:

Cache-Oblivious B-Tree Insert Static Cache-Oblivious Tree 21 10 45 4 16 38 54 4 10 16 21 38 45 54 83 4 -- 1 -- 10 7 6 16 -- 15 13 -- 21 38 31 24 23 45 -- 40 39 -- 54 48 83 70 59 56 To insert a key of 37: Find correct section of PMA location using static tree. 37

Cache-Oblivious B-Tree Insert Static Cache-Oblivious Tree 21 10 45 4 16 38 54 4 10 16 21 38 45 54 83 4 -- 1 -- 10 7 6 16 -- 15 13 -- 21 38 31 24 23 45 -- 40 39 -- 54 48 83 70 59 56 To insert a key of 37: Find correct section of PMA location using static tree. Insert into PMA. This step may cause a rebalance of the PMA. 37

Cache-Oblivious B-Tree Insert Static Cache-Oblivious Tree 21 10 45 4 16 38 54 4 10 16 21 38 45 54 83 4 -- 1 -- 10 7 6 16 -- 15 13 -- 21 37 31 24 23 -- 40 39 38 56 54 48 45 -- 83 70 59 To insert a key of 37: Find correct section of PMA location using static tree. Insert into PMA. This step possibly requires a rebalance. Fix the static tree.

Cache-Oblivious B-Tree Insert Static Cache-Oblivious Tree 21 10 40 4 16 37 56 4 10 16 21 37 40 56 83 4 -- 1 -- 10 7 6 16 -- 15 13 -- 21 37 31 24 23 -- 40 39 38 56 54 48 45 -- 83 70 59 To insert a key of 37: Find correct section of PMA location using static tree. Insert into PMA. This step possibly requires a rebalance. Fix the static tree.

Cache-Oblivious B-Tree Insert Static Cache-Oblivious Tree 21 10 40 4 16 37 56 4 10 16 21 37 40 56 83 4 -- 1 -- 10 7 6 16 -- 15 13 -- 21 37 31 24 23 -- 40 39 38 56 54 48 45 -- 83 70 59 Insert is a complex operation. If we wanted to use locks, what is the locking protocol? What is the right (cache-oblivious?) lock granularity?

Conclusions A page-based TM system such as Libxac Represents a good match for disk-resident data structures. The per-page overheads of TM are small compared to cost of I/O. Is easy to program with. Libxac allows us to program a concurrent, disk-resident data structure with ACID properties, as though it was stored in memory.