I/O-Algorithms Lars Arge University of Aarhus February 13, 2005.

Slides:



Advertisements
Similar presentations
Temporal Databases S. Srinivasa Rao April 12, 2007
Advertisements

Chapter 4: Trees Part II - AVL Tree
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
External Memory Geometric Data Structures
I/O-Algorithms Lars Arge January 31, Lars Arge I/O-algorithms 2 Random Access Machine Model Standard theoretical model of computation: –Infinite.
Multiversion Access Methods - Temporal Indexing. Basics A data structure is called : Ephemeral: updates create a new version and the old version cannot.
I/O-Algorithms Lars Arge University of Aarhus February 21, 2005.
1 Advanced Database Technology Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Spring 2004 March 4, 2004 INDEXING II Lecture based on [GUW,
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
I/O-Algorithms Lars Arge Spring 2011 March 8, 2011.
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
1 B trees Nodes have more than 2 children Each internal node has between k and 2k children and between k-1 and 2k-1 keys A leaf has between k-1 and 2k-1.
I/O-Algorithms Lars Arge Aarhus University February 13, 2007.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
Cache-Oblivious B-Trees
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
I/O-Algorithms Lars Arge Aarhus University March 5, 2008.
I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries Second Year Project Presentation Ke Yi Advisor: Lars Arge Committee: Pankaj.
I/O-Algorithms Lars Arge Aarhus University April 16, 2008.
I/O-Algorithms Lars Arge Aarhus University February 9, 2006.
I/O-Algorithms Lars Arge Aarhus University March 9, 2006.
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
I/O-Algorithms Lars Arge Aarhus University March 6, 2007.
I/O-Algorithms Lars Arge University of Aarhus March 7, 2005.
1 Geometric index structures April 15, 2004 Based on GUW Chapter , [Arge01] Sections 1, 2.1 (persistent B- trees), 3-4 (static versions.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
(B+-Trees, that is) Steve Wolfman 2014W1
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Tirgul 6 B-Trees – Another kind of balanced trees.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Heavily based on slides by Lars Arge I/O-Algorithms Thomas Mølhave Spring 2012 February 9, 2012.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
Chapter Tow Search Trees BY HUSSEIN SALIM QASIM WESAM HRBI FADHEEL CS 6310 ADVANCE DATA STRUCTURE AND ALGORITHM DR. ELISE DE DONCKER 1.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.
Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Lecture 2: External Memory Indexing Structures CS6931 Database Seminar.
Arboles B External Search The algorithms we have seen so far are good when all data are stored in primary storage device (RAM). Its access is fast(er)
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
1 CPS216: Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu.
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
External Memory Geometric Data Structures Lars Arge Duke University June 27, 2002 Summer School on Massive Datasets.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Michal Balas1 I/O-efficient Point Location using Persistent B-Trees Lars Arge, Andrew Danner, and Sha-Mayn Teh Department of Computer Science, Duke University.
Multiway Search Trees Data may not fit into main memory
Temporal Indexing MVBT.
Temporal Indexing MVBT.
Advanced Topics in Data Management
(2,4) Trees (2,4) Trees 1 (2,4) Trees (2,4) Trees
B-Trees.
CPS216: Advanced Database Systems
Presentation transcript:

I/O-Algorithms Lars Arge University of Aarhus February 13, 2005

Lars Arge I/O-Algorithms 2 I/O-Model Parameters N = # elements in problem instance B = # elements that fits in disk block M = # elements that fits in main memory K = # output size in searching problem We often assume that M>B 2 I/O: Movement of block between memory and disk D P M Block I/O

Lars Arge I/O-Algorithms 3 Fundamental Bounds [AV88] Internal External Scanning: N Sorting: N log N Permuting –List rank Searching: Note: –Permuting not linear –Permuting and sorting bounds are equal in all practical cases –B factor VERY important: –Cannot sort optimally with search tree

Lars Arge I/O-Algorithms 4 (a,b)-tree uses linear space and has height  Choosing a,b = each node/leaf stored in one disk block   space and query (a,b)-tree T is an (a,b)-tree (a≥2 and b≥2a-1) –All leaves on the same level (contain between a and b elements) –Except for the root, all nodes have degree between a and b –Root has degree between 2 and b  tree

Lars Arge I/O-Algorithms 5 (a,b)-Tree Insert Insert: Search and insert element in leaf v DO v has b+1 elements/children Split v: make nodes v’ and v’’ with and elements insert element (ref) in parent(v) (make new root if necessary) v=parent(v) Insert touch nodes v v’v’’

Lars Arge I/O-Algorithms 6 (a,b)-Tree Delete Delete: Search and delete element from leaf v DO v has a-1 elements/children Fuse v with sibling v’: move children of v’ to v delete element (ref) from parent(v) (delete root if necessary) If v has >b (and ≤ a+b-1<2b) children split v v=parent(v) Delete touch nodes v v

Lars Arge I/O-Algorithms 7 (a,b)-tree properties: –If b=2a-1 one update can cause many rebalancing operations –If b≥2a update only cause O(1) rebalancing operations amortized –If b>2a rebalancing operations amortized *Both somewhat hard to show –If b=4a easy to show that update causes rebalance operations amortized *After split during insert a leaf contains  4a/2=2a elements *After fuse during delete a leaf contains between  2a and  5a elements (split if more than 3a  between 3/2a and 5/2a) (a,b)-Tree insert delete (2,3)-tree

Lars Arge I/O-Algorithms 8 B-Tree B-trees: (a,b)-trees with a,b = –O(N/B) space –O(log B N+T/B) query –O(log B N) update B-trees with elements in the leaves sometimes called B + -tree Construction in I/Os –Sort elements and construct leaves –Build tree level-by-level bottom-up

Lars Arge I/O-Algorithms 9 B-tree B-tree with branching parameter b and leaf parameter k (b,k≥8) –All leaves on same level and contain between 1/4k and k elements –Except for the root, all nodes have degree between 1/4b and b –Root has degree between 2 and b B-tree with leaf parameter –O(N/B) space –Height – amortized leaf rebalance operations – amortized internal node rebalance operations B-tree with branching parameter B c, c≥1, and leaf parameter B –Space O(N/B), updates, queries

Lars Arge I/O-Algorithms 10 Secondary Structures When secondary structures used, a rebalance on v often require O(w(v)) I/Os (w(v) is weight of v) –If inserts have to be made below v between operations  O(1) amortized split bound  amortized insert bound Nodes in standard B-tree do not have this property  tree

Lars Arge I/O-Algorithms 11 BB[  ]-tree In internal memory BB[  ]-trees have the desired property Defined using weight-constraints –Ratio between weight of left child an weight of right child of a node v is between  and 1-   Height O(log N) If rebalancing can be performed using rotations Seems hard to implement BB[  ]-trees I/O-efficiently x y x y

Lars Arge I/O-Algorithms 12 Weight-balanced B-tree Idea: Combination of B-tree and BB[  ]-tree –Weight constraint on nodes instead of degree constraint –Rebalancing performed using split/fuse as in B-tree Weight-balanced B-tree with parameters b and k (b>8, k≥8) –All leaves on same level and contain between k/4 and k elements –Internal node v at level l has w(v) < –Except for the root, internal node v at level l have w(v)> –The root has more than one child level l-1 level l

Lars Arge I/O-Algorithms 13 Weight-balanced B-tree Every internal node has degree between and  Height External memory: –Choose 4b=B (or even B c for 0 < c ≤ 1) –k=B  O(N/B) space, query level l-1 level l

Lars Arge I/O-Algorithms 14 Weight-balanced B-tree Insert:

Lars Arge I/O-Algorithms 15 Weight-balanced B-tree Delete:

Lars Arge I/O-Algorithms 16 Weight-balanced B-tree Weight-balanced B-tree with branching parameter b and leaf paramater k=Ω(B) –O(N/B) space –Height – rebalancing operations after update –Ω(w(v)) updates below v between consecutive operations on v Weight-balanced B-tree with branching parameter B c and leaf parameter B –Updates in and queries in I/Os Construction bottom-up in I/O

Lars Arge I/O-Algorithms 17 Persistent B-tree In some applications we are interested in being able to access previous versions of data structure –Databases –Geometric data structures (later) Partial persistence: –Update current version (getting new version) –Query all versions We would like to have partial persistent B-tree with –O(N/B) space – N is number of updates performed – update – query in any version

Lars Arge I/O-Algorithms 18 Persistent B-tree East way to make B-tree partial persistent –Copy structure at each operation –Maintain “version-access” structure (B-tree) Good query in any version, but –O(N/B) I/O update –O(N 2 /B) space i i+2 i+1 update i+3 i i+2 i+1

Lars Arge I/O-Algorithms 19 Persistent B-tree Idea: Elements augmented with “existence interval” and stored in one structure Persistent B-tree with parameter b (>16): –Directed graph *Nodes contain elements augmented with existence interval *At any time t, nodes with elements alive at time t form B-tree with leaf and branching parameter b –B-tree with leaf and branching parameter b on indegree 0 node  If b=B: –Query at any time t in I/Os

Lars Arge I/O-Algorithms 20 Persistent B-tree: Updates Updates performed as in B-tree To obtain linear space we maintain new-node invariant: – New node contains between and alive elements and no dead elements

Lars Arge I/O-Algorithms 21 Persistent B-tree Insert Search for relevant leaf u and insert new element If u contains B+1 elements: Block overflow –Version split: Mark u dead and create new node u’ with x alive element –If : Strong overflow –If : Strong underflow –If then recursively update parent(l): Delete reference to u and insert reference to u’

Lars Arge I/O-Algorithms 22 Persistent B-tree Insert Strong overflow ( ) –Split v into u’ and u’’ with elements each ( ) –Recursively update parent(u): Delete reference to l and insert reference to v’ and v’’ Strong underflow ( ) –Merge x elements with y live elements obtained by version split on sibling ( ) –If then (strong overflow) perform split into nodes with (x+y)/2 elements each ( ) –Recursively update parent(u): Delete two insert one/two references

Lars Arge I/O-Algorithms 23 Persistent B-tree Delete Search for relevant leaf u and mark element dead If u contains alive elements: Block underflow –Version split: Mark u dead and create new node u’ with x alive element –Strong underflow ( ): Merge (version split) and possibly split (strong overflow) –Recursively update parent(u): Delete two references insert one or two references

Lars Arge I/O-Algorithms 24 Persistent B-tree Insert Delete done Block overflow Block underflow done Version split Strong overflow Strong underflow MergeSplit done Strong overflow Split done -1,+1 -1,+2 -2,+2 -2,+1 0,0

Lars Arge I/O-Algorithms 25 Persistent B-tree Analysis Update: –Search and “rebalance” on one root-leaf path Space: O(N/B) –At least updates in leaf in existence interval –When leaf u dies *At most two other nodes are created *At most one block over/underflow one level up (in parent(l))  –During N updates we create: * leaves * nodes i levels up  Space:

Lars Arge I/O-Algorithms 26 Other B-tree Variants Level-balanced B-trees –Global instead of local balancing strategy –Whole subtrees rebuilt when too many nodes on a level –Used when parent pointers and divide/merge operations needed String B-trees –Used to maintain and search (variable length) strings

Lars Arge I/O-Algorithms 27 B-tree Construction In internal memory we can sort N elements in O(N log N) time using a balanced search tree: –Insert all elements one-by-one (construct tree) –Output in sorted order using in-order traversal Same algorithm using B-tree use I/Os –A factor of non-optimal As discussed we could build B-tree bottom-up in I/Os –But what about persistent B-tree? –In general we would like to have dynamic data structure to use in algorithms  I/O operations