Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 583 Analysis of Algorithms

Similar presentations


Presentation on theme: "CS 583 Analysis of Algorithms"— Presentation transcript:

1 CS 583 Analysis of Algorithms
B-Trees CS 583 Analysis of Algorithms 11/23/2018 CS583 Fall'06: B-Trees

2 Outline Data Structures on Secondary Storage B-Trees Self-test
Magnetic disks Efficient operations B-Trees Definitions Searching Inserting Self-test 18.1-1, , , 11/23/2018 CS583 Fall'06: B-Trees

3 Magnetic Disks The main memory of a computer system consists of silicon memory chips. It is typically two orders of magnitude more expensive than the magnetic storage technology. Magnetic disks are cheaper and have higher capacity than main memory. However, they are much slower because of moving parts. In order, to amortize time spent for mechanical movements, disks access several items at the same time. Information is divided into equal size pages. Pages appear as consecutive bits within cylinders. Once the read/write head is positioned at the desired page, large amounts of data can be accessed quickly. 11/23/2018 CS583 Fall'06: B-Trees

4 Disk Operations When x is an object that resides on a disk the following pseudocode conventions are used: x = <a pointer to some object> Disk-Read(x) <access and modify fields of x> Disk-Write(x) In most systems the running time of a B-Tree algorithm is determined by the number of disk read and write operations. Hence, a B-tree node is usually as large as a disk page. Example: a B-tree with a branching factor of 1001 and height 2 can store a Billion+ keys. Since the root note is stored in main memory, only two disk accesses at most are needed to find any key! 11/23/2018 CS583 Fall'06: B-Trees

5 B-tree Definition We assume that any satellite information associated with a key is stored in the same node as a key. A B-tree is a rooted tree with the following properties: Every node x has the following fields: n[x], the number of keys stored in x. n[x] keys stored in non-decreasing order: key1[x] <= key2[x] <= ... <= keyn[x][x] leaf[x] = true if x is a leaf, and false otherwise. Each internal node x contains n[x]+1 pointers to its children: c1[x], c2[x], ... , cn[x]+1[x] 11/23/2018 CS583 Fall'06: B-Trees

6 B-tree Definition (cont.)
Properties (cont.): The keys keyi[x] separate the ranges stored in each subtree: if ki is any key stored in the subtree with root ci[x], then k1 <= key1[x] <= k2 <= key2[x] <= ...<= keyn[x][x] <= kn[x]+1 All leaves have the same depth, -- the tree’s height h. There are lower and upper bounds on the number of keys in a node. They are expressed in terms of an integer t >= 2 called the minimum degree: Every node other than the root must have at least t-1 keys. Every node can contain at most (2t-1) keys. We say the node is full if it contains exactly (2t-1) keys. 11/23/2018 CS583 Fall'06: B-Trees

7 Height of the Tree The number of disk accesses for a B-tree is proportional to the height of the tree. Theorem 18.1 If n >= 1, then for any n-key B-tree T of height h and minimum degree t >= 2: h <= logt (n+1)/2 Proof. If a B-tree has height h, the root contains at least one key and all other nodes contain at least (t-1) keys. Thus there are at least 2 nodes at depth 1, at least 2t nodes at depth 2, and so on, until 2th-1 nodes at depth h. 11/23/2018 CS583 Fall'06: B-Trees

8 Height of the Tree (cont.)
The number of n keys satisfies inequality: n >= 1 + (t-1) i=1,h 2ti-1 = 1+2(t-1)(th-1)/(t-1) = 2 th-1 => th <= (n+1)/2 => h <= logt(n+1)/2  Hence the height of the B-tree grows as O(logt n) , which is significantly slower than the growth of the height of the red-black tree, -- O(lg n). This means that the number of disk accesses is substantially reduced for most tree operations. 11/23/2018 CS583 Fall'06: B-Trees

9 Basic Operations The root of the B-tree is always in main memory.
Disk-Read on the root is never required. Disk-Write is required when the root node is changed. Any nodes that are passed as parameters have already had Disk-Read performed on them. All basic procedures are “one-pass” algorithms: They proceed downward from the root of the tree, without having to back up. 11/23/2018 CS583 Fall'06: B-Trees

10 Searching The searching algorithm takes as input a pointer to the root node x of a subtree, and a key k. It returns a pair (y, i) such that keyi[y] = k. B-Tree-Search(x,k) 1 i = 1 2 while i <= n[x] and k > key_i[x] 3 i++ 4 if i <= n[x] and k = key_i[x] 5 return (x,i) 6 if leaf[x] 7 return NIL 8 else 9 Disk-Read (c_i[x]) // read ith child of x 10 return B-Tree-Search(c_i[x],k) 11/23/2018 CS583 Fall'06: B-Trees

11 Searching: Performance
The nodes encountered during the recursion form a path downward from the root of the tree. The number of disk pages accessed by B-Tree-Search is O(h) = O(logt n). For each node, n[x] < 2t, hence the while loop 2-3 takes O(t) time. Therefore the total CPU time is O(th) = O(logt n). 11/23/2018 CS583 Fall'06: B-Trees

12 Inserting General algorithm: If the node y is full (having 2t-1 keys):
Search for the leaf node y at which to insert the new key. If the node y is full (having 2t-1 keys): Split the full node around its median key: keyt[y]: Create two nodes with (t-1) keys each. Move the median key up to y’s parent. If y’s parent is also full, make the split again. The key is inserted in a single path down the tree. Each full node is split along the way. This assures that when the y node needs to be split, its parent cannot be full. 11/23/2018 CS583 Fall'06: B-Trees

13 Splitting a Node The procedure B-Tree-Split-Child takes as input non-full node x, index i, and a full child y of x: y=ci[x]. The procedure then splits y in two and adjusts x so that it has an additional child. When the root needs to be split, a new root needs to be created. The tree grows in height by one. Splitting is the only means to grow the tree. 11/23/2018 CS583 Fall'06: B-Trees

14 Splitting Node: Pseudocode
B-Tree-Split-Child(x,i,y) 1 z = Allocate-Node() // allocate a disk page 2 leaf[z] = leaf[y] 3 n[z] = t-1 4 for j = 1 to t-1 5 keyj[z] = keyj+t[y] 6 if not leaf[y] 7 for j = 1 to t cj[z] = cj+t[y] 9 n[y] = t-1 // shift children to the right 10 for j = n[x] downto i+1 11 cj+1[x] = cj[x] 12 ci+1[x] = z // add z as a new child 11/23/2018 CS583 Fall'06: B-Trees

15 Splitting Node: Pseudocode (cont.)
// make room for the median 13 for j = n[x] downto i 14 keyj+1[x] = keyj[x] 15 keyi[x] = keyt[y] 16 n[x]++ 17 Disk-Write(y) 18 Disk-Write(z) 19 Disk-Write(x) The CPU time is determined by loops 4-5 and 7-8, which is (t). Note that other loops perform O(t) iterations. The procedure performs (1) disk operations. 11/23/2018 CS583 Fall'06: B-Trees

16 Inserting a Key: Algorithm
B-Tree-Insert(T,k) 1 r = root[T] 2 if n[r] = 2t-1 // full node 3 s = Allocate-Node() 4 root[T] = s 5 leaf[s] = FALSE 6 n[s] = 0 7 c1[s] = r // split the old root 8 B-Tree-Split-Child(s,1,r) 9 B-Tree-Insert-Nonfull(s,k) 10 else 11 B-Tree-Insert-Nonfull(r,k) 11/23/2018 CS583 Fall'06: B-Trees

17 Inserting a Key: Algorithm (cont.)
// Insert key k into a non-full node x B-Tree-Insert-Nonfull(x,k) 1 i = n[x] 2 if leaf[x] // k is inserted in the ordered list 3 while i >= 1 and k < keyi[x] keyi+1[x] = keyi[x] i-- 6 keyi+1[x] = k 7 n[x]++ 8 Disk-Write(x) 9 else // search the leaf to insert into 11/23/2018 CS583 Fall'06: B-Trees

18 Inserting a Key: Algorithm (cont.)
10 while i >= 1 and k < keyi[x] i-- 12 i++ 13 Disk-Read(ci[x]) 14 if n[ci[x]] = 2t-1 // full node B-Tree-Split-Child(x,i,ci[x]) if k > keyi[x] i++ 18 B-Tree-Insert-Nonfull(ci[x], k) 11/23/2018 CS583 Fall'06: B-Trees

19 Inserting a Key: Performance
The number of disk accesses performed by B-Tree-Insert is O(h) for a B-tree of height h. Only a O(1) of Disk-Read and Disk-Write operations are performed at each level in the B-Tree-Insert-Nonfull. The total CPU time is O(t h) = O(logt n) At each level of the tree the number of CPU operations are determined by while loops in B-Tree-Insert-Nonfull. The maximum number of iterations in these loops are 2t-1, hence the total time at each level is O(t). 11/23/2018 CS583 Fall'06: B-Trees


Download ppt "CS 583 Analysis of Algorithms"

Similar presentations


Ads by Google