Designing Concurrent Search Structure Algorithms Dennis Shasha.

Slides:



Advertisements
Similar presentations
Chapter 13. Red-Black Trees
Advertisements

COSC2007 Data Structures II Chapter 10 Trees I. 2 Topics Terminology.
TRANSACTION PROCESSING SYSTEM ROHIT KHOKHER. TRANSACTION RECOVERY TRANSACTION RECOVERY TRANSACTION STATES SERIALIZABILITY CONFLICT SERIALIZABILITY VIEW.
Concurrent Search Structure Algorithms Dennis Shasha.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
November 5, Algorithms and Data Structures Lecture VIII Simonas Šaltenis Nykredit Center for Database Research Aalborg University
A balanced life is a prefect life.
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
Tirgul 5 AVL trees.
1 Trees. 2 Outline –Tree Structures –Tree Node Level and Path Length –Binary Tree Definition –Binary Tree Nodes –Binary Search Trees.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Data Structures Data Structures Topic #8. Today’s Agenda Continue Discussing Table Abstractions But, this time, let’s talk about them in terms of new.
Trees and Red-Black Trees Gordon College Prof. Brinton.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
Chapter 4: Transaction Management
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
CS4432: Database Systems II
Binary Search Trees Chapter 7 Objectives
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
By : Budi Arifitama Pertemuan ke Objectives Upon completion you will be able to: Create and implement binary search trees Understand the operation.
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
CPSC 335 BTrees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
B+ Tree What is a B+ Tree Searching Insertion Deletion.
COSC2007 Data Structures II
B+ Trees COMP
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Section 3.1: Proof Strategy Now that we have a fair amount of experience with proofs, we will start to prove more difficult theorems. Our experience so.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Introduction to Algorithms Jiafen Liu Sept
CSIT 402 Data Structures II
B-Trees And B+-Trees Jay Yim CS 157B Dr. Lee.
Balanced Search Trees Fundamental Data Structures and Algorithms Margaret Reid-Miller 3 February 2005.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Starting at Binary Trees
Symbol Tables and Search Trees CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
CS 61B Data Structures and Programming Methodology Aug 7, 2008 David Sun.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
3.1. Binary Search Trees   . Ordered Dictionaries Keys are assumed to come from a total order. Old operations: insert, delete, find, …
1 Chapter 7 Objectives Upon completion you will be able to: Create and implement binary search trees Understand the operation of the binary search tree.
 B-tree is a specialized multiway tree designed especially for use on disk  B-Tree consists of a root node, branch nodes and leaf nodes containing the.
B+ tree & B tree Extracted from Garcia Molina
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Data Structures: A Pseudocode Approach with C, Second Edition 1 Chapter 7 Objectives Create and implement binary search trees Understand the operation.
Binary Search Trees1 Chapter 3, Sections 1 and 2: Binary Search Trees AVL Trees   
Lecture 9- Concurrency Control (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
1 Binary Search Trees  Average case and worst case Big O for –insertion –deletion –access  Balance is important. Unbalanced trees give worse than log.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Content based on Chapter 10 Database Management Systems, (3 rd.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Binary Search Trees One of the tree applications in Chapter 10 is binary search trees. In Chapter 10, binary search trees are used to implement bags.
Multiway Search Trees Data may not fit into main memory
UNIT III TREES.
Extra slide #3.
B+ Tree.
Designing Concurrent Search Structure Algorithms
Slide Sources: CLRS “Intro. To Algorithms” book website
Slide Sources: CLRS “Intro. To Algorithms” book website
Binary Search Trees One of the tree applications in Chapter 10 is binary search trees. In Chapter 10, binary search trees are used to implement bags.
B- Trees D. Frey with apologies to Tom Anastasio
B-Trees.
Presentation transcript:

Designing Concurrent Search Structure Algorithms Dennis Shasha

What is a Search Structure? Data structure (typically a B tree, hash structure, R-tree, etc.) that supports a dictionary. Operations are insert key-value pair, delete key-value pair, and search for key- value pair.

How to make a search structure algorithm concurrent Naïve approach: use two phase locking (but then at the very least the root is read- locked so lock conflicts are frequent). Semi-naïve algorithm: use hierarchical tree locking: lock root; afterwards lock node n only if you hold lock on parent of n. (Still tends to hold locks high in tree.)

How can we do better: fundamental insight In a search structure algorithm, all that we really care about is that we implement the dictionary operations correctly. Operations on structure need not even be serializable provided they maintain certain constraints.

Train Your Intuition: parable of the library Imagine a library with books. It’s a little old fashion so there are still card catalogues that identify the shelf where a book is held. Bob wants to get a book B. Alice is working on reorganizing the library by moving books from shelf to shelf and then changing the card catalogue.

Parable of the library: interleaving of ops Bob 1. look up book B in catalogue. Bob 2. read “go to shelf S” Bob 3. Start walking but see friend. Alice 1: move several books from S to S’, leaving a note. Alice 2: change catalogue so B maps to S’ Bob 4: go to S, follow note to S’

Parable of the library: observations Not conflict-preserving serializable: Bob  Alice (Bob reads catalog then Alice changes it) Alice  Bob (Alice modifies S before Bob reads) Indeed in no serial execution would Bob go to two shelves. Yet execution is completely ok!

Parable of the library: what’s going on? All we care about is that 1. structure is ok after Alice finishes. 2. Bob gets his book if it’s there We want to find a general theory for this. Ref: Vossen Weikum book and ``Concurrent Search Structure Algorithms'‘ D. Shasha and N. Goodman, ACM Transactions on Database Systems, vol. 13, no. 1,pp , March 1988.

Good Structure for any Dictionary Data Structure Dictionary holds a set of key-value pairs. Values don’t matter for our theory so consider just the set of keys that could be present, denoted keyspace. Example: all natural numbers. From the root (in general, any root), must be able to navigate to a node n such that n either has a key being sought or no node has that key.

Example: binary search tree Inset = Keyspace Inset = {x| x > 50} Inset = {x| x < 50} Inset = {x| x 10}

Inset, Outset, Keyset Inset(n) is the subset of Keyspace that are either in n or could be reachable (according to the rules of the structure) from n Edgeset(n,n’) is the subset of Keyspace directed to descendant n’ of n. Union of all edgesets with source n is outset(n) Keyset(n) = Inset(n) – Outset(n). The set of keys that are in node n or nowhere.

Notes Inset(n) = union over all edges (m,n) of inset(m) ^ edgeset(m,n). Note that Edgeset(n,n’) need not always be a subset of Inset(n). You’ll see why this is good later.

Example: binary search tree Keyspace is all integers Inset = Keyspace; keyset = {50} Outset = {x|x!=50} Inset = {x| x > 50} = edgeset(node 50, node 70) Keyset = Inset Inset = {x| x < 50} Keyset = Inset – {x| x > 10} = {x| x <= 10} Inset = {x| x 10} edgeset (node 10, node 35) = {x|x > 10} Keyset = Inset

Structure Goodness Conditions The keysets of the nodes partition the keyspace. So U {Keyset(n) | n is a node} = Keyspace and if n!=n’ then keyset(n) is disjoint from keyset(n’). Edgsets leaving node n are disjoint Let Existkeys(n) be the keys actually present at node n. Existkeys(n) is a subset of keyset(n).

Structure Goodness Conditions (applies to each root) In the library, suppose that initially, inset(shelf S) = {books | authors begin with “S”}. Afterwards, outset(S) = {books|author names begin with “Sh” or later} At end keyset(S) = books having names starting with Sa through Sg. Inset(S’)= books having names starting with Sh through Sz.

Example: library at beginning Cat S A Inset of catalog = Keyspace Outset = Keyspace; keyset = {} Inset = {x| x begins with “S”} = edgeset(cat,S) Keyset = Inset Inset = {x| x begins with “A”} = edgeset(cat,S) …

Example: library after reshelving Cat S A Inset of catalog = Keyspace Outset = Keyspace; keyset = {} Inset = {x| x begins with “Sh”.. “Sz”} Keyset = Inset Inset = {x| x begins with “A”} … S’ Inset = {x| x begins with “S”} = edgeset(cat,S) Outset = {x |x begins with “Sh” or greater}

Example: library after reshelving and catalog change Cat S A Inset of catalog = Keyspace Outset = Keyspace; keyset = {} Inset = {x| x begins with “Sh”.. “Sz”} = edgeset(Cat, S’) Keyset = Inset Inset = {x| x begins with “A”} … S’ Inset = {x| x begins with “S” through “Sg”} = edgset(cat, S) Outset = {x |x begins with “Sh” or greater}

Observe Without the note from S to S’, there would be keys on S’ yet S’ would have a null inset and hence a null keyset. This violates the Existkeys part of the structural condition. Note also that we can’t eliminate the note from S to S’ even after the catalog is updated. Why?

Execution Goodness For a search for an item B beginning at node m, the following invariant holds: After any operation of any process, if the search for item B is at node x, then B is in keyset(x) or there is a path from x to node y such that B is in keyset(y) and every edge E along that path has B in its edgeset.

Execution Goodness Proof Sketch Provided the search reaches the node having B in its keyset, the search will find B there or will find it nowhere. The invariant ensures that the search will not end its search anywhere else.

Execution Goodness Proof Why is it that Bob is fine in spite of the fact that the Bob and Alice concurrent execution could never execute serially? Because even when Bob is at shelf S, the book Bob is looking for is in edgeset(S,S’) and B is in keyset(S’).

Practical Applications Most sophisticated database management systems use some version of the library parable in their B-trees, hash structures, etc. Reason: locks need not be held as long and can be held lower in the tree. B trees for example have links at the leaf level. So a split looks like this:

B tree simplified (two vals per node) , 7 Inset = {x | 0 <=90}; keyset = {} Outset = inset Inset = {x| x > 50 and x <= 90} = edgeset(node 50, node 70) Keyset = Inset Inset = {x| x < 50} Keyset = Inset

B tree insert(32): split left leaf at 15 Only 1,7 node needs to be locked , 7 32 Inset = {x | 0 <=90}; keyset = {} Outset = inset Inset = {x| x > 50 and x <= 90} = edgeset(node 50, node 70) Keyset = Inset Inset = {x| x < 50} Keyset = Inset – {x| x > 15} = {x| x <= 15} Edgeset = {x|x > 15}

Readjust parent (so lock it briefly) 15, , 7 32 Inset = {x | 0 <=90}; keyset = {} Outset = inset Inset = {x| x > 50 and x <= 90} = edgeset(node 50, node 70) Keyset = Inset Inset = {x| x < 50} Keyset = Inset – {x| x > 15} = {x| x <= 15} Edgeset = {x|x > 15}

Can Generalize Using Model Above algorithm is due to Lehman and Yao and is called the B-link algorithm. Long journal article to present and prove. Now can generalize to any structure. Ensure structure works and invariant holds on execution. Also possible to invent a new algorithm making direct use of the model.

High Concurrency Without Links: Give-up algorithm Explicitly record the description of inset of each node in the node. Search(B) descends. If B is ever not in the inset of the current node, then give up and start over. Happens rarely enough that performance is as good as B-link for searches. Less work for deletions. Proof is immediate.

Conclusion Simple framework for all search structures. Handful of concepts: keyspace, inset, edgeset, outset, keyset. Can be a guide to coding.

Exercise When can Alice remove the note directing those seeking certain books to go from S to S’? Try to design a merge algorithm for a B- tree in the give-up setting. Lock as little and as low as possible.