Generic Trees—Trie, Compressed Trie, Suffix Trie (with Analysi

Slides:



Advertisements
Similar presentations
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Advertisements

Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
© 2004 Goodrich, Tamassia Tries1. © 2004 Goodrich, Tamassia Tries2 Preprocessing Strings Preprocessing the pattern speeds up pattern matching queries.
Tries Standard Tries Compressed Tries Suffix Tries.
Tries Search for ‘bell’ O(n) by KMP algorithm O(dm) in a trie Tries
Alon Efrat Computer Science Department University of Arizona Suffix Trees.
Advanced Algorithm Design and Analysis (Lecture 4) SW5 fall 2004 Simonas Šaltenis E1-215b
Modern Information Retrieval
Goodrich, Tamassia String Processing1 Pattern Matching.
IP Address Lookup for Internet Routers Using Balanced Binary Search with Prefix Vector Author: Hyesook Lim, Hyeong-gee Kim, Changhoon Publisher: IEEE TRANSACTIONS.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Fall 2007CS 2251 Trees Chapter 8. Fall 2007CS 2252 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information.
Indexed Search Tree (Trie) Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Digital Search Trees & Binary Tries Analog of radix sort to searching. Keys are binary bit strings.  Fixed length – 0110, 0010, 1010,  Variable.
Department of Computer Eng. & IT Amirkabir University of Technology (Tehran Polytechnic) Data Structures Lecturer: Abbas Sarraf Search.
6/26/2015 7:13 PMTries1. 6/26/2015 7:13 PMTries2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3) Huffman encoding.
Recursion Recitation – 11/(6,7)/2008 CS 180 Department of Computer Science, Purdue University.
Review Binary Tree Binary Tree Representation Array Representation Link List Representation Operations on Binary Trees Traversing Binary Trees Pre-Order.
Introduction n – length of text, m – length of search pattern string Generally suffix tree construction takes O(n) time, O(n) space and searching takes.
© 2004 Goodrich, Tamassia Tries1. © 2004 Goodrich, Tamassia Tries2 Preprocessing Strings Preprocessing the pattern speeds up pattern matching queries.
Multi-way Trees. M-way trees So far we have discussed binary trees only. In this lecture, we go over another type of tree called m- way trees or trees.
Spring 2010CS 2251 Trees Chapter 6. Spring 2010CS 2252 Chapter Objectives Learn to use a tree to represent a hierarchical organization of information.
Tries1. 2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3)
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Sets of Digital Data CSCI 2720 Fall 2005 Kraemer.
Tries Data Structure. Tries  Trie is a special structure to represent sets of character strings.  Can also be used to represent data types that are.
1 Lexicographic Search:Tries All of the searching methods we have seen so far compare entire keys during the search Idea: Why not consider a key to be.
Contents What is a trie? When to use tries
Advanced Data Structures Lecture 8 Mingmin Xie. Agenda Overview Trie Suffix Tree Suffix Array, LCP Construction Applications.
Instructor: Lilian de Greef Quarter: Summer 2017
Tries 4/16/2018 8:59 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Data Structures and Design in Java © Rick Mercer
CSCE 210 Data Structures and Algorithms
BCA-II Data Structure Using C
Tries 07/28/16 11:04 Text Compression
Tries 5/27/2018 3:08 AM Tries Tries.
Binary Search Tree A Binary Search Tree is a binary tree that is either an empty or in which every node contain a key and satisfies the following conditions.
Higher Order Tries Key = Social Security Number.
Azita Keshmiri CS 157B Ch 12 indexing and hashing
IP Routers – internal view
Data Structure and Algorithms
Mark Redekopp David Kempe
Lecture Trees Chapter 9 of textbook 1. Concepts of trees
AVL Tree.
Chapter 10.1 Binary Search Trees
Lecture 22 Binary Search Trees Chapter 10 of textbook
Lecture 18. Basics and types of Trees
Tries 9/14/ :13 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
abstract containers sequence/linear (1 to 1) hierarchical (1 to many)
External Methods Chapter 15 (continued)
Digital Search Trees & Binary Tries
Binary Tree and General Tree
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
Tries A trie is another type of tree structure. The word “trie” comes from the word “retrieval,” but is usually pronounced like “try.” For our purposes,
(2,4) Trees /26/2018 3:48 PM (2,4) Trees (2,4) Trees
Indexing and Hashing Basic Concepts Ordered Indices
Advanced Algorithms Analysis and Design
Advance Database System
Digital Search Trees & Binary Tries
Data Structures and Analysis (COMP 410)
Chapter 11 Indexing And Hashing (1)
Data Structures and Algorithm Analysis Trees
(2,4) Trees /24/2019 7:30 PM (2,4) Trees (2,4) Trees
Tries 2/23/2019 8:29 AM Tries 2/23/2019 8:29 AM Tries.
Tries 2/27/2019 5:37 PM Tries Tries.
(2,4) Trees /6/ :26 AM (2,4) Trees (2,4) Trees
Trees in java.util A set is an object that stores unique elements
Tree A tree is a data structure in which each node is comprised of some data as well as node pointers to child nodes
Huffman Coding Greedy Algorithm
Binary Search Trees < > = Dictionaries
Presentation transcript:

Generic Trees—Trie, Compressed Trie, Suffix Trie (with Analysi

Introduction Trees that store things more efficiently, such as a binary search tree. A tree is a general structure of recursive nodes. There are many types of tree. Popular ones are binary tree and balanced tree. A Trie is a kind of tree, known by many names including prefix tree, digital search tree, retrieval tree. We use a trie to store pieces of data that have a key (used to identify the data) and possibly a value (which holds any additional data associated with the key). Here, we will use data whose keys are strings.

Trie Figure shows 4-way search tree In computer science a trie, or pre fix tree is an ordered multi-way tree data structure that is used to store strings over an alphabet. A multiway tree is a tree that can have more than two children. A multiway tree of order m (or an m-way tree) is one in which a tree can have m children. Figure shows 4-way search tree A trie is a tree data structure that allows strings with similar character prefixes to use the same prefix data and store only the tails as separate data. One character of the string is stored at each level of the tree, with the first character of the string stored at the root The term trie comes from "retrieval

Trie

Trie A trie is useful when we want to make a search in 1000 string stored in it. Some applications are: A trie can also be used to replace a hash table. A common application of a trie is storing a predictive text or autocomplete dictionary, such as found on a mobile telephone. Trie can be used for Term indexing Spell checking. Word completion Data compression Lexographical ordering Routing table for IP addresses Storing/Querying XML documents etc.

Trie - implementation A trie tree uses the property that if two strings have a common prefix of length n then these two will have a common path in the trie tree till the length n. typedef struct trie_node { bool NotLeaf; trie_node *pChildren[NR]; var_type word[20]; }node; where : #define NR 27 // the alphabet(26 letters) plus blank. typedef char var_type // the key is a set of characters

Trie - Operations Searching Insertion Deletion Traversing

Trie You can see in the below picture as to how a trie data structure looks like for key, value pairs ("abc",1),("xy",2),("xyz",5),("abb",9)("xyzb",8), ("word",5). Each node including root has 'n' number of children, where 'n' is total number of possible alphabets out of which a key would be formed. In this example, if we assume that a key would not consist of any other alphabets than characters 'a' to 'z' then value of 'n' would be 26. Each node therefore would point to 26 other nodes. Each node is basically an alphabet in the path from root node to leaf node(which stores value for that key). For example, to search the record associated with key 'abc',  we start from root node then go to 0th child since first character in key 'abc' has index of 0 in alphabets. By taking 0th index, we reach node 'a', at this point we go to 1st child and reach node 'b', here we take 2nd path to child and go to node 'c'. Now to reach node 'c', in total we have visited node 'a' then node 'b' followed by node 'c', in other words key 'abc' is visited completely and hence we stop at node 'c' and return the value stored at this node.

Trie - Insertion Refer Figure in Slide 8 Insertion: Let's say we want to insert a key-value pair ("abc",1) into the trie. 1. We go to root node. 2. Get the index of the first character ('a') of "abc". That would be 0 in our alphabet system. 3. Go to 0th child of root. Because 0th child is null we first construct a TrieNode to which this 0th child would point. This newly constructed node would be node 'a'. We mark this node as current node. 4. Now we get the index of the second character of "abc". That would be 1 and therefore we go to 1st child of current node(from step #3). 5. Here again, 1st child is null. We create a new TrieNode which would be node 'b'. We mark this node as current node. 6. Now we get the index of the third character of "abc". That would be 2 and therefore we go to 2nd child current node(from step#5). 7. Here again, 2nd child is null. We create a new TrieNode which would be node 'c'. We mark this node as current node. 8. At this step, we are done reading all the characters of the given key. Hence we mark the current node that is node 'c' as leaf node and store value 1 at this leaf node.

Trie - Insertion Refer Figure in Slide 8 Insertion: Now at this point let's say we want to insert a key-value pair ("abb",9) into the trie. 1. We go to root node. 2. Get the index of the first character of "abb". That would be 0 in our alphabet system. 3. Go to 0th child of root. Now as you can notice, 0th child won't be null since we have constructed node 'a' in the previous insertion sequence. We mark this node 'a' as the current node. 4. Now we get the index of the second character('b') of "abb". That would be 1, we go to 1st child of current node(from step #3). 5. 1st child of current node which is node 'b' is not null. We mark this node 'b' as current node. We now get the index of the last character ('b') of "abb". That index would be 1 and hence we go to 1st child of current node which is null. We create a new TrieNode which would be node 'b'. Current node now points to this newly created node. 6. We are done reading all characters if given key "abb". We mark current node as leaf node and store value 9 in it.

Trie - Searching Refer Figure in Slide 8 Search algorithm steps: Example-1 : searching for non-existing key "ac” 1. Go to root node. 2. Pick the first character of key "ac" which would be 'a'. Find out its index using alphabet system in use. 3. Index returned would be 0, go to 0th child of root which is node 'a'. Mark this node as current node. 4. Pick the second character of key "ac" which would be 'c'. Its index would be 2 and therefore we go to 2nd child of current node. 5. Now at this point, we find out that 2nd child of current node is null. If you notice, from insertion algorithm of a given key, no node in the key-path from the root could be null. If it is null then that implies that this key was never inserted in the trie. And therefore in such cases, we return 'KEY_NOT_FOUND'. Example-2 : searching for an existing key "abb“ The steps are very similar to example-1. We keep on reading characters of given key and according to indices of characters, travel from root node to node 'b' which is at level-3 (if root is at level-0). At this point we would have read all the characters of the key "abb" and hence we return the value stored at this node.  

Trie - Deletion Refer Figure in Slide 8 Algorithm requirements for deleting key 'k': 1. If key 'k' is not present in trie, then we should not modify trie in any way. 2. If key 'k' is not a prefix nor a suffix of any other key and nodes of key 'k' are not part of any other key then all the nodes starting from root node(excluding root node) to leaf node of key 'k' should be deleted. For example, in the above trie if we were asked to delete key - "word", then nodes 'w','o','r','d' should be deleted. 3. If key 'k' is a prefix of some other key, then leaf node corresponding to key 'k' should be marked as 'not a leaf node'. No node should be deleted in this case. For example, in the above trie if we have to delete key - "xyz", then without deleting any node we have to simply mark node 'z' as 'not a leaf node' and change its value to "NON_VALUE" 4. If key 'k' is a suffix of some other key 'k1', then all nodes of key 'k' which are not part of key 'k1' should be deleted. For example, in the above trie if we were to delete key - "xyzb", then we should only delete node "b" of key "xyzb" since other nodes of this key are also part of key "xyz". 5. If key 'k' is not a prefix nor a suffix of any other key but some nodes of key 'k' are shared with some other key 'k1', then nodes of key 'k' which are not common to any other key should be deleted and shared nodes should be kept intact. For example, in the above trie if we have to delete key "abc" which shares node 'a', node 'b' with key "abb", then the algorithm should delete only node 'c' of key "abc" and should not delete node 'a' and node 'b'.

Refer Figure in Slide 8 Compressed Tries  To overcome the disadvantages of standard trie – space requirement Trie with nodes of degree at least 2  Obtained from standard trie by compressing chains of redundant nodes  Standard Trie: Compressed Tries

Search for the string bbaabb

Upload Report in moodle for the following Questions 1. Compare Tree with Trie 2. Analyse the complexity of various operations in standard Trie. 3. Explain the use of Trie in Routers. Give relevant figure. 4. Explain the compact representation of compressed and suffix trie. 5. Give the implementation of Trie.