Presentation is loading. Please wait.

Presentation is loading. Please wait.

Algorithm Design & Analysis – CS632 Group Project Jan Kosztala Bhuiyan Mahfuz Hasan Navaratnam Yogaparan “Yoga” John Stasko B-Trees Introduction Operations.

Similar presentations


Presentation on theme: "Algorithm Design & Analysis – CS632 Group Project Jan Kosztala Bhuiyan Mahfuz Hasan Navaratnam Yogaparan “Yoga” John Stasko B-Trees Introduction Operations."— Presentation transcript:

1 Algorithm Design & Analysis – CS632 Group Project Jan Kosztala Bhuiyan Mahfuz Hasan Navaratnam Yogaparan “Yoga” John Stasko B-Trees Introduction Operations Applications Complexities Summary

2  Introduction  Definitions  Properties  Basic Concepts  Tree Structures B-Trees

3 B-Tree Introduction  Tree structures have their best running times when their height is as small as possible.  Normally, data will be stored in the leaf nodes.  Each level of the tree indirectly represents one disk access.  When the tree height is small, disk accesses will be small.  When working with large sets of data, it is often not possible to maintain the entire structure in Main Memory (RAM).  Most computers do not have enough RAM to store the complete file of data in memory.  Smaller increments of data must be moved in and out of secondary storage into memory so it can be worked on.

4 B-Tree Introduction  A magnetic disk is the most common form of secondary storage used and is significantly slower than random access memory (RAM).  Main memory (RAM) is quite fast and located on the motherboard, therefore its access is quick.  A mechanical magnetic disk is slower, therefore data transfer between the disk and Main Memory takes time.  Each B-Tree node may contain large numbers of keys, this means fewer levels of nodes need to be traversed.  With a large amount of keys in each node, the B-Tree height will be relatively small due to greater branching.  A large base level of leaf nodes not far from the Root node minimizes disk accesses.

5 B-Tree Introduction  Secondary storage on large computers allow users to store, update, recall data from a large collection of information called files.  Organizing files intelligently in a tree structure will make the process of retrieving information more efficient.  Each node of a tree is usually set to a size which can be managed efficiently by a computer.

6 B-Tree Definitions  There is a single root node, which may have as few as two children (or none if the tree is empty). key Root node Children  B-Trees are balanced. The leaves at the bottom level of the left and right branches of a tree are the same height.

7 B-Tree Definitions  A Tree is a set of nodes  A leaf node has no children, there are no branches that extend below the leaves.  “B” in the name B-Tree does not stand for “Binary”, but for “Balanced, this confusion of the two is a common beginner’s mistake. Represents a Node

8 B-Tree Definitions  The order of a tree is the maximum number of children that can belong to a parent node, this will be the order of the B-Tree, denoted by m. So the maximum number of keys will be m-1.  Every node, except the root, has at least ceil (m/2) children and a minimum number of keys ceil(m/2) – 1 and this is denoted by t.  All leaves occur at the same level.  The keys in each node are in ascending order.

9 B-Tree Basic Concepts  A tree consists of a finite set of elements, called nodes.  A finite set of directed lines, called branches, that connect the nodes.  When the branch is directed toward the node, it is an indegree branch.  When the branch is directed away from the node it is an outdegree branch.  If the tree is not empty, then the first node is called the root.  The indegree of the root is, by definition, zero.  A leaf is any node with an outdegree of zero.  Nodes that are not a root or a leaf are know as internal nodes, because they are found in the middle portion of the tree.

10 B-Tree Basic Concepts  A node is a parent if it has successor nodes. That is, it has an outdegree greater than zero.  A node with a predecessor is a child. A child node has an indegree of one.  Two or more nodes with the same parent are siblings.  Ancestor is any node in the path from the root to the node.  Descendent is any node in the path below the parent node, that is, all nodes in the path from a given node to a leaf are descendents of the node.  A path is a sequence of nodes in which each node is adjacent to the next one. Every node in a tree can be reached by following a unique path starting from the root.

11 B-Tree Basic Concepts  The level of a node is its distance from the root, because the root has a zero distance from itself, this places the root at level 0. The child of the root are at level 1, their children are at level 2, and so forth. One thing to note is that the relationship between levels and siblings. Siblings are always at the same level, but all nodes in a level are not necessarily siblings.  The height of the tree is the level of the leaf in the longest path from the root plus one. By definition the height of an empty tree is -1. Because the tree is drawn upside-down, some Texts refer to the depth of a tree rather than its height.  A tree may be divided into a subtree. A subtree is any connected structure below the root. The first node in a subtree is known as the root of the subtree and is used to name the subtree.

12 B-Tree Basic Concepts Parents:A,B,F Children:B,E,F,C,D,G,H,I Siblings:A(B,E,F),B(C,D),F(G,H,I) Leaves:C,D,E,G,H,I Internal Nodes:B,F A Level = 0, Height = 1 Level = 1, Height = 2 Level = 2, Height = 3 Branch A,F Branch F,I DG EFB CHI

13 Tree Structures  Structures A and B are balanced and structure C is unbalanced. A B C Order = 2 Height = 4 Leaves = 8 Order = 3 Height = 3 Leaves = 9 Order = 5 Height = 4 Leaves = 13 Root Order Node Nodes Leaves

14 Operations Navaratnam Yogaparan “Yoga”

15 Insertion B-Tree Principle or Rules 1. First, do a search for the key, If the item is not there, search will end up at a leaf 2. If there is room in this leaf, just insert it there. This may require that some existing keys be moved one to the right to make room for the new item 3. If the leaf node is full, then the node must be split with about half of the keys going into a new node to the right of this one. 4. The median key is moved up into the parent node. 5. If the root node is ever split, the median key moves up into a new root node, thus causing the tree to increase in height by one.. First, do a search for the key, If the item is not there, search will end up at a leaf

16 Keys: 3 14 7 1 8 5 11 17 13 6 23 12 20 26 4 16 18 24 25 19 Order: 5 Max # of children : 5 Max # of keys: 4 Min # of children: ceil(5 / 2) = 3 Min # of keys = 2 Example Node representation

17 1 8 7143 7 138 51117 13 8141711 13 135 7 6 122023 26 Order: 5 Max # of children : 5 Max # of keys: 4 Min # of children: 3 Min # of keys = 2 Keys: 3 14 7 1 8 5 11 17 13 6 23 12 20 26 4 16 18 24 25 19

18 8141711 135 713 6 122023 26 14172326 20 135681112 713 4 Order: 5 Max # of children : 5 Max # of keys: 4 Min # of children: 3 Min # of keys = 2 Keys: 26 4 16 18 24 25 19

19 135681112 14172326 71320 4 13561618 25 24 4 811121417 2623 20137 19 Order: 5 Max # of children : 5 Max # of keys: 4 Min # of children: 3 Min # of keys = 2 Keys: 4 16 18 24 25 19

20 13568111214171618 26252423 201374 19 13 471720 135681112 1416 26252423 1819 Order: 5 Max # of children : 5 Max # of keys: 4 Min # of children: 3 Min # of keys = 2

21 Deletion Principle or Rules 1. Search the key. If the item is in a leaf and the leaf has more than min # of keys, just delete the item. 3. If the leaf or sibling has no extra keys, the leaf has to be combined with one of these siblings. This will bring down the parent’s key 2. If the key is in the leaf and does not have an extra key, then borrow a key from parent and move a key up from the sibling.

22 13 471720 1356811121416262524231819 Delete 8 13 471720 135611121416262524231819

23 13 471720 135611121416262524231819 23 2625241819 Delete 20 13 4717 135611121416

24 13 471723 1356111214162625241819 Delete 18 24 26251923 13 4717 135611121416

25 13 471724 13561112141626251923 Delete 5 1346 13 71724 1112141626251923

26 13 71724 13461112141626251923 1371724 13461112141626251923

27 Applications Bhuiyan Mahfuz Hasan

28 B-TREE APPLICATIONS  Database  Concurrent Access to B-Trees  Locking  Scheduling  Serializability  Two-Phase Locking

29  A B-tree is a method of placing and locating files (called records or keys) in a database.  The B-tree algorithm minimizes the number of times a medium must be accessed to locate a desired record, thereby speeding up the process.  B-trees are preferred when decision points, called nodes, are on hard disk rather than in random-access memory (RAM). B-trees save time by using nodes with many branches (called children). Databases

30  In a tree, records are stored in locations called leaves. This name derives from the fact that records always exist at end points; there is nothing beyond them.  The image shows a B-tree of order three for locating a particular record in a set of eight leaves (the ninth leaf is unoccupied, and is called a null). This B-tree has a depth of three. EXAMPLE 123657498

31  The B-tree functions optimally for the number of records it contains.  Databases cannot typically be maintained entirely in memory, B-trees are often used to index the data and to provide fast access. Example Searching an unindexed and unsorted database containing n key values will have a worst case running time of O(n); Indexed with a b-tree, the same search operation will run in O(log n). To perform a search for a single key on a set of one million keys (1,000,000), a linear search will require at most 1,000,000 comparisons. If the same data is indexed with a b-tree of minimum order 10, height 9, 81 comparisons will be required in the worst case.

32 Concurrent Access to B-Trees  Databases typically run in multi-user environments where many users can concurrently perform operations on the database. Example Of Complicacy Joint Account of Jack and Jill - Initial Balance=$60 JackJill Read Balance =$60 Debit = $40, Balance = $20 Write Balance = $20 Read Balance = $60 Debit = $30, Balance = 30 Write Balance = $30

33  The simplest solution is to serialize access to the data Structure.  If another process is using the tree, all other processes must wait.  Locking provides a mechanism for controlling concurrent Operations on data structures in order to prevent undesirable Side effects and to ensure consistency. Solutions for Concurrent Problem

34  When multiple users are updating the same database at the Same time  Locks are used when updating a record to prevent other users From simultaneously updating the same record.  Locks apply to individual instances of a record. They do not Apply to the database as a whole. Locking

35 The lock types are: Lock TypeCommandRequestsAllows Other User CONCURRENT READ CRRead accessReaders Writers CONCURRENT WRITE CWRead and write access Readers Writers PROTECTED READ PRRead accessReaders, No writers PROTECTED WRITE PWRead and writeReaders, No writers EXCLUSIVEEXRead and write access No

36  Concurrent Read allows read-only access to the record. Other Users can still access the record for read and write.  Concurrent Write allows both read and write access, while Also allowing other users both read and write access. (It is Strongly recommended NOT to use this lock type as updates Could be lost.  Protected Read allows read-only access to the record. Other Users can read the data, but no writers are allowed.  Protected Write allows read and write access. Other users can Read the data but no writers are allowed.  Exclusive establishes read and write access to the record. No Other users can access the record.

37 AdminUser 1 User 2 User 4 User 5 User 3 Concurrent B-Tree Example 123657498

38  Shared Lock (Read Only) Read Only Lock. Exclusive Locked User can be Read and Write  Exclusive Lock Read and Write Lock. Shared Locked User can only Read Generalized Lock Types

39  A schedule is a list of actions (reading, writing, aborting or Committing) from a set of transactions.  The order in which two actions of a transaction T appear in a Schedule must be the same as the order in which they appear in T. Schedule T1T2 R(A) W(A) R(B) W(B) Commit R(A) W(A) R(B) W(B) Commit (Interleaved Action, Reading Uncommitted data/Dirty Read)

40  It allows concurrency when there is no conflict  Prevents concurrency when there is a conflict. Serializability Strict Two-Phase Locking (Strict 2PL)  If a transaction T wants to read (respectively, modify) an Object, it first requests a shared- read (respectively, exclusive- Read/write) lock on the object.  All locks held by a transaction, are released when the transaction Is completed.

41 Example S = Shared lock X = Exclusive lock R = Read W = Write A, B, C are objects. T1T2 S(A) R(A) X(C) R(C) W(C) Commit S(A) R(A) X(B) R(B) W(B) Commit

42 Joint Account of Jack and Jill-Initial Balance=$60 JackJill Exclusive Lock By Jack Read Balance =$60 Debit = $40, Balance = $20 Write Balance = $20 Commit Shared(Read) Lock By Jill Read Balance = $60 Lock By Jack Exclusive Lock By Jill Read Balance = $20 Debit = $30, Balance = -10 Abort Solution for Jack and Jill Problem

43 Complexities John Stasko

44 Complexities & Enhancements  Data Structure  B-Tree Organization  B-Tree Height  Number of Keys Vs. Tree Height  Max Keys = Min Height(Best Case)  Min Keys = Max Height(Worst Case)  Running Time  Search Tree, Insert Tree, Delete Tree  B-Tree Improvements  B-Tree Variants  B*Trees  B+Trees

45  B-Trees generalize binary search trees in a Left to Right order. Data Structure  Keys in node X are used to divide the range of keys within the node into X + 1 subranges for its children.  An [X + 1] way decision based on comparisons between the keys will be used to search the B-Tree. X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9

46  B-Trees are well suited for secondary storage devices. B-Tree Organization  For a typical B-Tree application, the amount of data will be large to the extent that most of it will reside on a secondary storage device.  The B-Tree Algorithms read and write pages of data from the disk to Main Memory as needed.  Only a few number of pages will be required in Memory at one time.  A large Branching Factor reduces the number of disk accesses required to find a key.  A Usual Branching Factor size for a typical B-tree application will be in a range anywhere from 50 to 2000, depending on the size of the key relative to the size of a page.  In a B-Tree in which the Root node resides in memory, and all other nodes carry the data that the keys of the node references. A tree with a height of 2 will require at most 2 disk accesses to find any key in the tree, this can be realized in Constant Time O(1).  Every n-node Tree has a height of O(log n).

47  There are two common ways that key referenced data may be handled within a B-Tree organization: B-Tree Organization  The first way is that the node contains the key and the actual data associated with it.  In this case when searching for the data in the tree, the tree only has to be traversed until it reaches the node where the key resides.  The second way is that the data is stored in the leaves of the tree and only stores keys and child pointers in the internal nodes.  In this case when searching for the data, the tree has to be completely traversed to the bottom to the reach the leaf where the data is stored.

48  Best Case: Maximum # of keys = Minimum Tree Height B-Tree Height Using the polynomial Theorem: S = a + ar 1 + ar 2 + + ar (n-1) = a(r n -1)/(r-1) We convert the equation to a log base m, which becomes: h = log m (k + 1) k = (m-1) + m(m-1) + m 2 (m-1) + + m (h-1) (m-1) = (m-1) * ( 1 + m + m 2 + + m (h-1) ) k = (m-1) * ((m (h) -1)/(m-1)) = m h - 1 Now we redistribute terms: m h = k + 1 Root Node hnodesMax # of keys 11 m - 1 m - 1 Order = m Height = h Keys = k m - 1 2mm( m – 1) m - 1  m  m - 1 hm (h-1) m (h-1) (m-1)  m  m - 1 3m 2 m 2 ( m – 1)  m 

49  Worst Case: Minimum # of keys = Maximum Tree Height B-Tree Height hnodesMin # of keys 1 Root Node 11 1 Order = m Min nodes = ceil(m/2), which we will call c Height = h Keys = k Using the polynomial Theorem: S = a + ar 1 + ar 2 + + ar (n-1) = a(r n -1)/(r-1) We convert the equation to a log base ceil(m/2): h = 1 + log ceil(m/2) ((k + 1)/2) k = 1 + 2(c-1) + 2(c-1)c + 2(c-1)c 2 + + 2(c-1)c (h-2) = 1 + 2(c-1) * ( 1 + c + c 2 + + c (h-2) ) k = 1 + 2(c-1) * ((c (h-1) -1)/(c-1)) = 1 + 2c h-1 - 2 Now we redistribute terms: c (h-1) = (k + 1)/2 c - 1 222(c-1)  c  c - 1c – 1 c - 1 32c 2c(c-1)  c  c - 1 h2c (h-2) 2c (h-2) (c-1)  c  c - 1 4 2c 2 2c (2) (c-1)  c 

50  Worst Case: Max Height h = 1 + log ceil(m/2) ((k + 1)/2) B-Tree Height  Best Case: Min Height h = log m (k + 1)  θ Case: ceil(log m (k + 1)) <= h <= floor( 1 + log ceil(m/2) ((k + 1)/2))  Formula: log m k = log 10 k / log 10 m  Elements k = 1,000,000Order m = 2, 3, 5, 10, 50, 200  Special Case: m = 2 Max Height = Min Height m kMin. Tree HeightMax. Tree Height Min. Height Vs. Max Height. 21x10 6 ceil(19.93) = 2020 101x10 6 ceil(6.00) = 6floor( 9.58) = 9 501x10 6 ceil(3.53) = 4floor( 4.79) = 4 2001x10 6 ceil(2.60) = 3floor( 3.35) = 3 51x10 6 ceil(8.58) = 9floor(14.00) = 14 31x10 6 ceil(12.58) = 13floor(19.93) = 19

51  Running Time for a typical B-Tree application will be comprised of the number of disk accesses required and the computing time of the CPU. Running Time  During a disk Read or Write an entire page of information is accessed.  A B-Tree node is usually the size of a page of information.  Once the page is in memory though, the CPU time to access Main Memory is very small compared to the disk accesses.  The number of disk accesses is measured in terms of pages that have to be read from or written to the disk.  The number of disk pages accessed is θ(h) which is equal to θ(log t n) where h is the height of the Tree and n is the number of keys in the Tree.  The CPU time it takes to traverse within each node is O(t).  The Total Time is O(th) which is equal to O(tlog t n) or ≈ O(log n). Search Tree

52 Insert Tree Delete Tree  Special coding in Split_Child is required so the recursion process in Insert_Tree never descends to a full node (The Key is moved up, not down).  The Total Time is O(th) which is equal to O(tlog t n) or ≈ O(log n).  The CPU time to traverse within each node is O(t).  The number of disk pages accessed is O(h) which is equal to O(log t n).  The Total Time is O(th) which is equal to O(tlog t n) or ≈ O(log n).  The CPU time it takes to traverse within each node is O(t).  The number of disk pages accessed is O(h) which is equal to O(log t n).  Special coding is required so the recursion process in Delete_Tree never descends to a node with less than a minimum of t – 1 keys

53 Improvements  B Tree Variants.  B* and B+ Trees are 2 derivatives of a B-Tree.  Branching factors are improved while holding down node sizes.  B*Trees  Increase the minimum number of children allowed.  The minimum branching factor is: (ceil(2m-1)/3)  The minimum branching factor for a B-Tree is (m/2) mminimal B-Tree height of 3 minimal B*Tree height of 3 100 4999 8977 5 17 17 10 49 97 501249 2177 20 199 337 20019,999 35,377 As an example: Listed are the minimum number of keys for a Tree with order m.

54 Improvements  B+Trees  In a B+Tree, data usually associated with keys are not stored in the tree at all. Each node can contain up to:  m child pointers  m – 1 keys  m – 1 pointers to values stored elsewhere  This has an effect on performance because an additional disk access (h+1) will be required to reach the data.  To resolve this problem of inefficiency:  The data is stored in the leaves.  Keys are located in the nodes farther up in the Tree.  Copies of these keys must also be stored in the leaves so the data can be accessed quickly.

55 Improvements  Another B+Tree organization technique is to have an Index at the top, and the keys and the associated data is stored in the leaves.  Leaf nodes are linked together Left-to-Right  The linked-list of leaves are known as a Sequence Set  The Linked Sequence Set allows sequential processing. Root Node Index Leaves

56 Improvements  B+ Trees with an Index and Sequence set, have a different approach in dealing with information in Search, Insert and Delete operations.  In the Search operation since the leaves have pointers to data in close proximity from the initial search, the time finding additional related data will be minimized.  The Insert operation is similar to a B-Tree’s except when the split of a leaf takes place.  A copy of the middle key gets promoted into the index, the original key is retained in the right side of the leaf.  This helps in the Search operation, when the key in the index is found, the Search continues directly to the leaf, the leaves are then sequentially searched.

57 Improvements  B+ Trees with an Index and Sequence set, have a different approach in dealing with information in Search, Insert and Delete operations. 23 27 20 181711 102030 27 23 181711 102030  In the Delete operation since all the keys reside in leaves, deletion becomes simple.  The Index does not always need changing, Non-key values in the Index do not need deleting, they are kept as separators, as long as leaves they address carry their minimum order ceil(m/2).

58 Summary John Stasko

59 Summary  A B-Tree is a balanced, multiway, file organization  Efficient, versatile, simple, and easily maintained  Variations such as B+ Trees allow:  Efficient sequential processing of files  Find, Insert, and Delete operations retain desirable logarithmic costs  B-Tree schemes promote 50% storage usage  Various implementation techniques provide:  Enhanced performance  Generality  Ability to be used in multi-user environments


Download ppt "Algorithm Design & Analysis – CS632 Group Project Jan Kosztala Bhuiyan Mahfuz Hasan Navaratnam Yogaparan “Yoga” John Stasko B-Trees Introduction Operations."

Similar presentations


Ads by Google