Tries.

Slides:



Advertisements
Similar presentations
 Definition of B+ tree  How to create B+ tree  How to search for record  How to delete and insert a data.
Advertisements

Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
COMP 451/651 Indexes Chapter 1.
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
Other time considerations Source: Simon Garrett Modifications by Evan Korth.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Data Structures Using C++ 2E Chapter 11 Binary Trees and B-Trees.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
CS4432: Database Systems II
1 Multiway trees & B trees & 2_4 trees Go&Ta Chap 10.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 6.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Multi-way Trees. M-way trees So far we have discussed binary trees only. In this lecture, we go over another type of tree called m- way trees or trees.
Multiway Trees. Trees with possibly more than two branches at each node are know as Multiway trees. 1. Orchards, Trees, and Binary Trees 2. Lexicographic.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
INTRODUCTION TO MULTIWAY TREES P INTRO - Binary Trees are useful for quick retrieval of items stored in the tree (using linked list) - often,
B-Trees And B+-Trees Jay Yim CS 157B Dr. Lee.
External Searching: B-Trees Dr. Jicheng Fu Department of Computer Science University of Central Oklahoma.
 … we have been assuming that the data collections we have been manipulating were entirely stored in memory.
COSC 2007 Data Structures II Chapter 15 External Methods.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
2-3 Trees, Trees Red-Black Trees
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of CHAPTER 12: Multi-way Search Trees Java Software Structures: Designing.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
B-trees Eduardo Laber David Sotelo. What are B-trees? Balanced search trees designed for secondary storage devices Similar to AVL-trees but better at.
1 Multi-Level Indexing and B-Trees. 2 Statement of the Problem When indexes grow too large they have to be stored on secondary storage. However, there.
 B-tree is a specialized multiway tree designed especially for use on disk  B-Tree consists of a root node, branch nodes and leaf nodes containing the.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
1 Lexicographic Search:Tries All of the searching methods we have seen so far compare entire keys during the search Idea: Why not consider a key to be.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
B-Trees Katherine Gurdziel 252a-ba. Outline What are b-trees? How does the algorithm work? –Insertion –Deletion Complexity What are b-trees used for?
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
B-Trees B-Trees.
Multiway Search Trees Data may not fit into main memory
B-Trees B-Trees.
B+-Trees.
B+ Tree.
Wednesday, April 18, 2018 Announcements… For Today…
B-Trees.
B- Trees D. Frey with apologies to Tom Anastasio
B- Trees D. Frey with apologies to Tom Anastasio
B- Trees D. Frey with apologies to Tom Anastasio
B-Trees.
B-Trees.
Presentation transcript:

Tries

Multiway Trees Trees with possibly more than two branches at each node are know as Multiway trees. 1. Orchards, Trees, and Binary Trees 2. Lexicographic Search Trees: Tries 3. External Searching: B-Trees 4. Red-Black Trees

Lexicographic Search Trees: Tries We can apply the idea of table lookup to information retrieval from a tree by using a key or part of a key to make a multiway branch. Instead of searching by comparison of entire keys, we can consider a key as a sequence of characters, and use these characters to determine a multiway branch at each step. We make a 26-way branch according to the first letter of the name, followed by another branch according to the second letter, and so on. This multiway branching is the idea of a thumb index in a dictionary.

Lexicographic Search Trees: Tries A thumb index is generally used only to find the words with a given initial letter; some other search method is then used to continue. In a computer we can proceed two or three levels by multiway branching, but then the tree will become too large, and we shall need to resort to some other device to continue.

Lexicographic Search Trees: Tries To keep the size of such a multiway tree we could prune all the branches from the tree that do not lead to any key. e.g., in English, there are no words that begin with the letters ‘bb’, ‘bc’, ‘bf’, ‘bg’ …, but there are words beginning with ‘ba’, ‘bd’, or ‘be’. Hence all the branches and nodes for nonexixtent words can be removed from the tree. The resulting tree is called a trie. This term originated as letters extracted from the word retrieval.

Lexicographic Search Trees: Tries DEFINITION: - A trie of order m is either empty or consists of an ordered sequence of exactly m tries of order m.

Searching for a Key in Tries A trie describing the English words (as listed in the Oxford English Dictionary) made up only from the letters a, b, and c is shown in Figure 11.7. Along with the branches to the next level of the trie, each node contains a pointer to a record of information about the key, if any, that has been found when the node is reached. The search for a particular key begins at the root. The first letter of the key is used as an index to determine which branch to take. An empty branch means that the key being sought is not in the tree. Otherwise, we use the second letter of the key to determine the branch at the next level, and so continue.

Searching for a Key in Tries When we reach the end of the word, the information pointer directs us to the desired information. We shall use a NULL information pointer to show that the string is not a word in the trie. Note, therefore, that the word a is a prefix of the word aba, which is a prefix of the word abaca. On the other hand, the string abac is not an English word, and therefore its node has a NULL information pointer.

A Trie

Lexicographic Search Trees: Tries The number of steps required to search a trie or insert into it is proportional to the number of characters making up a key, not to a logarithm of the number of keys as in other tree-based searches. No. of searces  No. of characters in the key.

Insertion into a Trie Adding a new key to a trie is quite similar to searching for the key: We must trace our way down the trie to the appropriate point and set the data pointer to the information record for the new key. If, on the way, we hit a NULL branch in the trie, we must not terminate the search, but instead we must create new nodes and put them into the trie so as to complete the path corresponding to the new key.

Deletion from a Trie The same general plan used for searching and insertion also works for deletion from a trie. We trace down the path corresponding to the key being deleted, and when we reach the appropriate node, we set the corresponding data member to NULL. If now, however, this node has all its members NULL (all branches and the data member), then we should delete this node. To do so, we can set up a stack of pointers to the nodes on the path from the root to the last node reached. Alternatively, we can use recursion in the deletion algorithm and avoid the need for an explicit stack.

Assessment of Tries The number of steps required to search a trie (or insert into it) is proportional to the number of characters making up a key, not to a logarithm of the number of keys as in other tree-based searches. If this number of characters is small relative to the (base 2) logarithm of the number of keys, then a trie may prove superior to a binary tree. If, for example, the keys consist of all possible sequences of five letters, then the trie can locate any of n = 265 = 11,881,376 keys in 5 iterations, whereas the best that binary search can do is lg n  23.5 key comparisons.

Assessment of Tries In many applications, however, the number of characters in a key is larger, and the set of keys that actually occur is sparse in the set of all possible keys. In these applications, the number of iterations required to search a trie may very well exceed the number of key comparisons needed for a binary search. The best solution, finally, may be to combine the methods. A trie can be used for the first few characters of the key, and then another method can be employed for the remainder of the key. If we return to the example of the thumb index in a dictionary, we see that, in fact, we use a multiway branch to locate the first letter of the word, but we then use some other search method to locate the desired word among those with the same first letter.

Multiway Trees

External Searching So far we have assumed that all our data structures are kept in high-speed memory; that is, we have considered only internal information retrieval. For many applications, this assumption is reasonable, but for many other important applications, it is not. Let us now turn briefly to the problem of external information retrieval, where we wish to locate and retrieve records stored in a file.

Access Time The time required to access and retrieve a word from high-speed memory is a few nanoseconds at most. The time required to locate a particular record on a disk is measured in milliseconds, and for floppy disks can exceed a second. Hence the time required for a single access is millions of times greater for external retrieval than for internal retrieval. block of storage: On the other hand, when a record is located on a disk, the normal practice is not to read only one word, but to read in a large page or block of information at once, to save time. Typical sizes for blocks range from 256 to 1024 characters or words.

Access Time Why should it matter? speed of access: memory: 10 nsec typical (10/1,000,000,000 seconds) disk: milliseconds (1/1000 seconds) floppy disk: seconds A normal binary search in 1000 elements would access memory for log21000 = 10 x 10 ns where as 10 harddisk access would be 1000000x10 ns.

Access Time Our goal in external searching must be to minimize the number of disk accesses, since each access takes so long compared to internal computation. With each access, however, we obtain a block that may have room for several records. Using these records we may be able to make a multiway decision concerning which block to access next. Hence multiway trees are especially appropriate for external searching.

Multiway Search Trees Binary search trees generalize directly to multiway (m-way) search trees in which, for some integer m called the order of the tree, each node has at most m children. If k  m is the number of children, then the node contains exactly k - 1 keys, which partition all the keys in the subtrees into k subsets. If some of these subsets are empty, then the corresponding children in the tree are empty. Figure 11.8 shows a 5-way search tree (between 1 and 4 entries in each node) in which some of the children of some nodes are empty.

A 5-way search tree (not a B-tree) since some nodes have empty children, some have too few children, and the leaves are not all on the same level.

Multiway Search Trees An m-way search tree is a tree in which, for some integer m called the order of the tree, each node has at most m children. If k  m is the number of children, then the node contains exactly k - 1 keys, which partition all the keys into k subsets.

Multiway Search Tree We want a “Multiway Search Tree” that minimizes file accesse. Each node should be a block of data on the disk: Each node then requires a read of the disk, and each link points to a logical “area” of the disk file

minimize disk reads Movement level to level requires another disk read. so, we want the number of levels to be few. i.e. the height of the tree should be as small as possible. We want to minimize disk reads

Searches are costly we want all searches to terminate in the same number of accesses i.e. don’t want long searches and short searches, rather balanced performance so insist that all leaves are at the same level.

minimal number of children Insist that every node have some minimal number of children this helps us get “the most” from each node

Access Time don’t want: prefer:

Balanced Multiway Trees (B-Trees)

B-Tree A B-Tree is a special type of multiway search tree that satisfies the design goals discussed above. As we look at the definition of a B-Tree we will notice that the design goals outlined above are incorporated directly into the definition. We will discuss this definition and look at some examples of structures that are and are not B-Trees.

Balanced Multiway Trees Our goal is to devise a multiway search tree that will minimize file accesses; hence we wish to make the height of the tree as small as possible. We can accomplish this by insisting, first, that no empty subtrees appear above the leaves (so that the division of keys into subsets is as efficient as possible); second, that all leaves be on the same level (so that searches will all be guaranteed to terminate with about the same number of accesses); and, third, that every node (except the leaves) have at least some minimal number of children. We shall require that each node (except the leaves) have at least half as many children as the maximum possible. These conditions lead to the following formal definition:

Balanced Multiway Trees (B-Trees)

A B-Tree of order 2 A B-Tree of order 2 is a complete binary search tree. These all honour rules 1 & 2, but violate rule 3 adding rule 3, these are all B-trees of order 2

Balanced Multiway Trees (B-Trees)

Insertion into a B-Tree In contrast to binary search trees, B-trees are not allowed to grow at their leaves; instead, they are forced to grow at the root. General insertion method: 1. Search the tree for the new key. This search (if the key is truly new) will terminate in failure at a leaf. 2. Insert the new key into to the leaf node. If the node was not previously full, then the insertion is finished. 3. When a key is added to a full node, then the node splits into two nodes, side by side on the same level, except that the median key is not put into either of the two new nodes. 4. When a node splits, move up one level, insert the median key into this parent node, and repeat the splitting process if necessary. 5. When a key is added to a full root, then the root splits in two and the median key sent upward becomes a new root. This is the only time when the B-tree grows in height.

Growth of a B-Tree

Growth of a B-Tree

Growth of a B-Tree

The quick brown fox jumpes over the lazy dog. A B-tree of order 5 will contain up to 4 keys in each node.

The quick brown fox jumpes over the lazy dog.

The quick brown fox jumpes over the lazy dog.

The quick brown fox jumpes over the lazy dog.

Deletion from a B-Tree If the entry that is to be deleted is not in a leaf, then its immediate predecessor (or successor) under the natural order of keys is guaranteed to be in a leaf. We promote the immediate predecessor (or successor) into the position occupied by the deleted entry, and delete the entry from the leaf. If the leaf contains more than the minimum number of entries, then one of them can be deleted with no further action. If the leaf contains the minimum number, then we first look at the two leaves (or, in the case of a node on the outside, one leaf) that are immediately adjacent to each other and are children of the same node. If one of these has more than the minimum number of entries, then one of them can be moved into the parent node, and the entry from the parent moved into the leaf where the deletion is occurring. If the adjacent leaf has only the minimum number of entries, then the two leaves and the median entry from the parent can all be combined as one new leaf, which will contain no more than the maximum number of entries allowed. If this step leaves the parent node with too few entries, then the process propagates upward. In the limiting case, the last entry is removed from the root, and then the height of the tree decreases.

Deletion from a B-Tree

Deletion from a B-Tree

Deletion from a B-Tree

Red-Black Tree In your text book, the writer has used a contiguous list to store the entries within a single node of a B-tree. Doing so was appropriate because the number of entries in one node is usually relatively small and because we were emulating methods that might be used in external files on a disk, where dynamic memory may not be available, and records may be stored contiguously on the disk.

Red-Black Tree In general, however, we may use any ordered structure we wish for storing the entries in each B-tree node. Small binary search trees turn out to be an excellent choice. We need only be careful to distinguish between the links within a single B-tree node and the links from one B-tree node to another. Let us therefore draw the links within one B-tree node as curly colored lines and the links between B-tree nodes as straight black lines. Figure 11.16 shows a B-tree of order 4 constructed this way.

Red-Black Trees as B-Trees of Order 4

Red-Black Trees as B-Trees of Order 4

Red-Black Trees as B-Trees of Order 4

Red-Black Tree Hence we obtain the fundamental definition of this section: A red-black tree is a binary search tree, with links colored red or black, obtained from a B-tree of order 4 in the way just described. After we have converted a B-tree into a red-black tree, we can use it like any other binary search tree. In particular, searching and traversal of a red-black tree are exactly the same as for an ordinary binary search tree; we simply ignore the color of the links. Insertion and deletion, however, require more care to maintain the underlying B-tree structure.

Red-Black Trees as Binary Search Trees

Red-Black Trees as Binary Search Trees

Analysis of Red-Black Trees

Red-Black Insertion

Red-Black Insertion

Red-Black Insertion

Red-Black Insertion

Pointers and Pitfalls