Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.

Slides:



Advertisements
Similar presentations
B-Trees. Motivation When data is too large to fit in the main memory, then the number of disk accesses becomes important. A disk access is unbelievably.
Advertisements

AVL Trees1 Part-F2 AVL Trees v z. AVL Trees2 AVL Tree Definition (§ 9.2) AVL trees are balanced. An AVL Tree is a binary search tree such that.
Chapter 4: Trees Part II - AVL Tree
Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
Tirgul 6 B-Trees – Another kind of balanced trees Some notes regarding Home Work.
BTrees & Bitmap Indexes
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
Tirgul 5 AVL trees.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 19: B-Trees: Data Structures for Disk.
CS 206 Introduction to Computer Science II 12 / 03 / 2008 Instructor: Michael Eckmann.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
CS 206 Introduction to Computer Science II 12 / 01 / 2008 Instructor: Michael Eckmann.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
Tirgul 5 Comparators AVL trees. Comparators You already know interface Comparable which is used to compare objects. By implementing the interface, one.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
Tirgul 5 This tirgul is about AVL trees. You will implement this in prog-ex2, so pay attention... BTW - prog-ex2 is on the web. Start working on it!
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
E.G.M. PetrakisB-trees1 Multiway Search Tree (MST)  Generalization of BSTs  Suitable for disk  MST of order n:  Each node has n or fewer sub-trees.
Tirgul 6 B-Trees – Another kind of balanced trees.
CS4432: Database Systems II
Splay Trees and B-Trees
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
Index Structures for Files Indexes speed up the retrieval of records under certain search conditions Indexes called secondary access paths do not affect.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.
More Trees Multiway Trees and 2-4 Trees. Motivation of Multi-way Trees Main memory vs. disk ◦ Assumptions so far: ◦ We have assumed that we can store.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 6.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Multi-way Trees. M-way trees So far we have discussed binary trees only. In this lecture, we go over another type of tree called m- way trees or trees.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
INTRODUCTION TO MULTIWAY TREES P INTRO - Binary Trees are useful for quick retrieval of items stored in the tree (using linked list) - often,
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
P p Chapter 10 has several programming projects, including a project that uses heaps. p p This presentation shows you what a heap is, and demonstrates.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of CHAPTER 12: Multi-way Search Trees Java Software Structures: Designing.
CPSC 221: Algorithms and Data Structures Lecture #7 Sweet, Sweet Tree Hives (B+-Trees, that is) Steve Wolfman 2010W2.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Lecture 9 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
Search Trees: BSTs and B-Trees David Kauchak cs161 Summer 2009.
CS 206 Introduction to Computer Science II 04 / 22 / 2009 Instructor: Michael Eckmann.
B-Trees ( Rizwan Rehman) Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
+ David Kauchak cs312 Review. + Midterm Will be posted online this afternoon You will have 2 hours to take it watch your time! if you get stuck on a problem,
Data Structures and Algorithms (AT70.02) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Prof. Sumanta Guha Slide Sources: CLRS “Intro.
CMSC 341 Introduction to Trees. 2/21/20062 Tree ADT Tree definition –A tree is a set of nodes which may be empty –If not empty, then there is a distinguished.
Internal and External Sorting External Searching
B-Trees Katherine Gurdziel 252a-ba. Outline What are b-trees? How does the algorithm work? –Insertion –Deletion Complexity What are b-trees used for?
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
B-Tree Michael Tsai 2017/06/06.
Extra: B+ Trees CS1: Java Programming Colorado State University
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Summary of General Binary search tree
Lecture 7 Algorithm Analysis
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Lecture 7 Algorithm Analysis
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Lecture 7 Algorithm Analysis
CSE 373: Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
CSE 326: Data Structures Lecture #10 B-Trees
Presentation transcript:

Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions

Motivation Primary memory (RAM) : very fast, but costly Secondary storage (disk) : very cheap, but slow Problem: a large D.B. must reside partially on disk. But disk operations are very slow. Solution: take advantage of important disk property -Basic read/write unit is a page (2-4 Kb) - can’t read/write less. Thus when analyzing D.B. performance, we consider two different measures: CPU time and number of times we need to access the disk. Besides, B-trees are an interesting type of balanced trees...

B-Trees B-Tree: a balanced search tree whose nodes can have many children: A node x contains n[x] keys, and has n[x]+1 children (c 1 [x], c 2 [x], …, c n[x]+1 [x]). The keys in each node are ordered, and relate to their left and right sub-trees like regular search trees: if k i is any key stored in the sub-tree rooted at c i [x], then: All leaves have the same depth h (the tree’s height) There is a fixed integer t (the minimum degree) : –Every node (besides the root) has at least t-1 keys (i.e. t children) –Every node can contain at most 2t-1 keys (2t children).

Example t=3

B-Trees and disk access (last time...) Each node contains as many keys as possible without being larger than a single page on disk. Whenever we need to access a node – load it from the disk (one read operation), after changing a node – rewrite it to the disk. For example, say each node contains 1000 keys – then the root has 1001 children, each of which also has 1001 children. Thus with just 2 disk accesses we are able to access ~ records. Operations are designed to work in one pass from the root to the leaves – we do not need to backtrack our steps. This further reduces the number of disk accesses we make.

The height of a B-Tree Theorem: If n  1, then for any B-tree of height h with n keys and minimum degree t  2: h  log t ( (n+1) / 2 ) Proof : Each child of the root has at least t children, each of them also has at least t children, and so on. Thus in every sub-tree of the root there are at least nodes. Each of them contains at least t-1 keys. The root contains at least one key and has at least two children, so we have:

B-Tree Search Search is done in the regular way: In each node, we find the sub-tree in which our value might be, and recursively find it there. Performance : O(th) = O(tlog t n) - total run-time, out of which: O(h) = O(log t n) - disk access operations

B-Tree Split Used for insertion. This operation verifies that a node will have less than 2t-1 keys. What we do is split the node into two nodes, each with t-1 keys. The extra key goes into the node’s parent (We assume the parent is not full) To split a node x (look at the next slide for illustration), take key t [x] (notice it is the median key). All smaller keys (exactly t-1 of them) form one new (legal) node, the same with all larger keys. key t [x] goes into x’s parent. If the node we split is the root, then a new root is created. This new root contains only one key.

B-tree split xy m t-1 keys xym t-1 keys (parent) (full node) Notice that the parent has many other sub-trees that don’t change.

Example A full node (t=3)

B-Tree Insert We insert a key only to a leaf. We start from the root and go down to the appropriate leaf. On the way, before going down to the next node, we check if it is full. If so, we split it (its father is non-full because we checked this before going down to the father). When we reach the right leaf, we know that the leaf is not full, so we can simply insert the new value to the leaf. Notice that we may need to split the root, if it is full. In this case, the tree’s height increases (but the tree remains completely balanced!). That’s why we say that a B-tree grows from the root, in contrast to most of the trees, who grow from the leaves...

Example (I) Inserting 3,7,34,10, (II) Inserting 25 splits the root (III) Inserting 40 and (IV) Inserting 17 splits the right leaf We start with an empty tree (t=3)

B-Tree Insert (cont.) Performance: –Split: three disk accesses (to write the 2 new nodes, and the parent) O(t) - total run time –Insert: O(h) - disk accesses O(tlog t n) - total run time Requires O(1) pages in main memory.

Problem Set 1 Solutions 2.b, Solution –This recurrence is defined for power of 2, –We use here the general Master theorem which deals with –In this case –We will claim that (next next slide)

Problem Set 1 Solutions (cont) –The first case of Master theorem applies: for some, because –Using the Master theorem we have

Problem Set 1 Solutions (cont) –Claim : Recall that This implies that for any constant substitute, and get Now set and we get that If we multiply by n we get

Problem Set 1 Solutions (cont) 3. Finding the missing integer –All the integers from 0 to n except one, are stored in an array A[1…n]. We want to find the missing integer. –In this problem we cannot access an entire integer in A with a single operation. –The elements in A are represented as binary strings. –The only operation to access the integers is to ask for the ith bit of A[j]. –Each such query takes constant time. –Give an algorithm that finds the missing integer using queries.

Problem Set 1 Solutions (cont) Assumptions: –All integers are presented with the same number of bits. 0’s are put to the left of smaller numbers. – for some integer k. Let x be the missing integer.

Problem Set 1 Solutions (cont) The principle: –A binary search –Run on the first bit of all the integers in A (takes n queries). –Count the number of 0’s you see on the way. –If x<N/2 then one zero is missing (N/2-1 0’s). –If x>=N/2 then no zero is missing (N/2 0’s). –After one pass of A we know: The first bit of x Whether x<N/2 or not We have only half the problem to search. (only elements with the same bit as x) –One pass of A is one step in the binary search. –Proceed to query the next bit.

Problem Set 1 Solutions (cont) The problem: –Is to keep track of all the elements of A that are still relevant for the search. Solution: –Use a linked list L –Each element y in L contains two fields (except for the usual pointers in a linked list ). y.pA – a pointer to an element in A y.bit – one bit of that element in A. –Run through A by running over the pointers in the linked list. –Updates the bits in the elements of L as you run (in every pass). –When the number of 0’s counted at the end of the pass, delete from the list the elements pointing to non relevant elements of A.

Problem Set 1 Solutions (cont) The run time of the algorithm: –Each run goes a constant number of times through the liked list Performs a constant number of operations for each element. O(n) steps. –Each step the list is shortened by half. – for some constant d. –Solution with Master theorem rule number 3: (a=1, b=2)

Problem Set 1 Solutions (cont) 2.d Solution –This recurrence is defined for power of 2, –Les us try to see how it behaves. Opening it up, we have: –We see that the answers are power of 2.

Problem Set 1 Solutions (cont) –We now have to observe that if we present n as a power of 2 as well, we get a simpler rule. –We thus can guess: –The recurrence presented in term of power of 2, is: –Let us prove by induction on k that is indeed the solution.

Problem Set 1 Solutions (cont) –Base case: For k=0,1 it is true because of the initialization of the recurrence. –Reduction assumption: –Reduction proof: