Mark Redekopp David Kempe

Slides:



Advertisements
Similar presentations
CS50 WEEK 6 Kenny Yu.
Advertisements

M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Shulin You UNIVERSITY OF MASSACHUSETTS, AMHERST – Department of Electrical and Computer Engineering.
IP Routing Lookups Scalable High Speed IP Routing Lookups.
© 2004 Goodrich, Tamassia Tries1. © 2004 Goodrich, Tamassia Tries2 Preprocessing Strings Preprocessing the pattern speeds up pattern matching queries.
Tries Standard Tries Compressed Tries Suffix Tries.
Tries Search for ‘bell’ O(n) by KMP algorithm O(dm) in a trie Tries
Alon Efrat Computer Science Department University of Arizona Suffix Trees.
Advanced Algorithm Design and Analysis (Lecture 4) SW5 fall 2004 Simonas Šaltenis E1-215b
Digital Search Trees & Binary Tries Analog of radix sort to searching. Keys are binary bit strings.  Fixed length – 0110, 0010, 1010,  Variable.
IP Address Lookup for Internet Routers Using Balanced Binary Search with Prefix Vector Author: Hyesook Lim, Hyeong-gee Kim, Changhoon Publisher: IEEE TRANSACTIONS.
Higher Order Tries Key = Social Security Number.   9 decimal digits. 10-way trie (order 10 trie) Height
Design a Data Structure Suppose you wanted to build a web search engine, a la Alta Vista (so you can search for “banana slugs” or “zyzzyvas”) index say.
Digital Search Trees & Binary Tries Analog of radix sort to searching. Keys are binary bit strings.  Fixed length – 0110, 0010, 1010,  Variable.
1 HEXA: Compact Data Structures or Faster Packet Processing Author: Sailesh Kumar, Jonathan Turner, Patrick Crowley, Michael Mitzenmacher. Publisher: ICNP.
6/26/2015 7:13 PMTries1. 6/26/2015 7:13 PMTries2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3) Huffman encoding.
Data Structures & Algorithms Radix Search Richard Newman based on slides by S. Sahni and book by R. Sedgewick.
Address Lookup in IP Routers. 2 Routing Table Lookup Routing Decision Forwarding Decision Forwarding Decision Routing Table Routing Table Routing Table.
IP Address Lookup Masoud Sabaei Assistant professor
© 2004 Goodrich, Tamassia Tries1. © 2004 Goodrich, Tamassia Tries2 Preprocessing Strings Preprocessing the pattern speeds up pattern matching queries.
Lecture 12 : Trie Data Structure Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University.
Tries1. 2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3)
Higher Order Tries Key = Social Security Number.   9 decimal digits. 10-way trie (order 10 trie) Height
Sets of Digital Data CSCI 2720 Fall 2005 Kraemer.
Contents What is a trie? When to use tries
Search Radix search trie (RST) R-way trie (RT) De la Briandias trie (DLB)
Advanced Data Structures Lecture 8 Mingmin Xie. Agenda Overview Trie Suffix Tree Suffix Array, LCP Construction Applications.
Generic Trees—Trie, Compressed Trie, Suffix Trie (with Analysi
Tries 4/16/2018 8:59 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Searching, Maps,Tries (hashing)
15-853:Algorithms in the Real World
Data Structures and Analysis (COMP 410)
COMP261 Lecture 23 B Trees.
COMP261 Lecture 22 Data Compression 2.
Tries 07/28/16 11:04 Text Compression
Record Storage, File Organization, and Indexes
Tries 5/27/2018 3:08 AM Tries Tries.
CSC 421: Algorithm Design & Analysis
Higher Order Tries Key = Social Security Number.
COMP 53 – Week Eleven Hashtables.
IP Routers – internal view
CSCI 210 Data Structures and Algorithms
B+-Trees.
B+-Trees.
B+-Trees.
CSCI Trees and Red/Black Trees
Tries 9/14/ :13 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
13 Text Processing Hongfei Yan June 1, 2016.
Radix search trie (RST) R-way trie (RT) De la Briandias trie (DLB)
HEXA: Compact Data Structures for Faster Packet Processing
Digital Search Trees & Binary Tries
Tree data structure.
Strings: Tries, Suffix Trees
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
CSCE350 Algorithms and Data Structure
Tries A trie is another type of tree structure. The word “trie” comes from the word “retrieval,” but is usually pronounced like “try.” For our purposes,
Tree data structure.
Hash Tables Chapter 12.7 Wherein we throw all the data into random array slots and somehow obtain O(1) retrieval time Nyhoff, ADTs, Data Structures and.
Digital Search Trees & Binary Tries
Higher Order Tries Key = Social Security Number.
Data Structures and Analysis (COMP 410)
Tries 2/23/2019 8:29 AM Tries 2/23/2019 8:29 AM Tries.
Suffix Trees String … any sequence of characters.
Tries 2/27/2019 5:37 PM Tries Tries.
Strings: Tries, Suffix Trees
CSCI 104 Skip Lists Mark Redekopp.
Mark Redekopp David Kempe
Sequences 5/17/ :43 AM Pattern Matching.
Tree A tree is a data structure in which each node is comprised of some data as well as node pointers to child nodes
EECE.3220 Data Structures Instructor: Dr. Michael Geiger Spring 2019
Lecture-Hashing.
Presentation transcript:

Mark Redekopp David Kempe CSCI 104 Tries Mark Redekopp David Kempe

Tries

Review of Set/Map Again Recall the operations a set or map performs… Insert(key) Remove(key) find(key) : bool/iterator/pointer Get(key) : value [Map only] We can implement a set or map using a binary search tree Search = O(_________) But what work do we have to do at each node? Compare (i.e. string compare) How much does that cost? Int = O(1) String = O( m ) where m is length of the string Thus, search costs O( ____________ ) "help" "hear" "ill" "heap" "help" "in"

Review of Set/Map Again Recall the operations a set or map performs… Insert(key) Remove(key) find(key) : bool/iterator/pointer Get(key) : value [Map only] We can implement a set or map using a binary search tree Search = O( log(n) ) But what work do we have to do at each node? Compare (i.e. string compare) How much does that cost? Int = O(1) String = O( m ) where m is length of the string Thus, search costs O( m * log(n) ) "help" "hear" "ill" "heap" "help" "in"

Review of Set/Map Again We can implement a set or map using a hash table Search = O( 1 ) But what work do we have to do once we hash? Compare (i.e. string compare) How much does that cost? Int = O(1) String = O( m ) where m is length of the string Thus, search costs O( m ) "help" Conversion function 2 1 2 3 4 5 heal help ill hear 3.45

Tries Assuming unique keys, can we still achieve O(m) search but not have collisions? O(m) means the time to compare is independent of how many keys (i.e. n) are being stored and only depends on the length of the key Trie(s) (often pronounced "try" or "tries") allow O(m) retrieval Sometimes referred to as a radix tree or prefix tree Consider a trie for the keys "HE", "HEAP", "HEAR", "HELP", "ILL", "IN" - H I H I L N E E L N A L L A L L P R P P R P

Tries Rather than each node storing a full key value, each node represents a prefix of the key Highlighted nodes indicate terminal locations For a map we could store the associated value of the key at that terminal location Notice we "share" paths for keys that have a common prefix To search for a key, start at the root consuming one unit (bit, char, etc.) of the key at a time If you end at a terminal node, SUCCESS If you end at a non-terminal node, FAILURE - H I H I L N E E L N A L L A L L P R P P R P

A "value" type could be stored for each non-terminal node Tries To search for a key, start at the root consuming one unit (bit, char, etc.) of the key at a time If you end at a terminal node, SUCCESS If you end at a non-terminal node, FAILURE Examples: Search for "He" Search for "Help" Search for "Head" Search takes O(m) where m = length of key Notice this is the same as a hash table - H I H I L N E E L N A L L A L L P R P P R P A "value" type could be stored for each non-terminal node

Your Turn Construct a trie to store the set of words Ten Tent Then Tense Tens Tenth

Application: IP Lookups Network routers form the backbone of the Internet Incoming packets contain a destination IP address (128.125.73.60) Routers contain a "routing table" mapping some prefix of destination IP address to output port 128.125.x.x => Output port C 128.209.32.x => Output port B 128.x.x.x => Output port D 132.x.x.x => Output port A Keys = Match the longest prefix Keys are unique Value = Output port Octet 1 Octet 2 Octet 3 Port 10000000 01111101 C 11010001 00100000 B D 10000100 A

IP Lookup Trie A binary trie implies that the Routing Table: - - - Left child is for bit '0' Right child is for bit '1' Routing Table: 128.125.x.x => Output port C 128.209.32.x => Output port B 128.209.44.x => Output port D 132.x.x.x => Output port A 1 … 1 - Octet 1 Octet 2 Octet 3 Port 10000000 01111101 C 11010001 00100000 B D 10000100 A D A 1 - - … C B

Structure of Trie Nodes What do we need to store in each node? Depends on how "dense" or "sparse" the tree is? Dense (most characters used) or small size of alphabet of possible key characters Array of child pointers One for each possible character in the alphabet Sparse (Linked) List of children Node needs to store ______ template < class V > struct TrieNode{ V* value; // NULL if non-terminal TrieNode<V>* children[26]; }; V* a b … z template < class V > struct TrieNode{ char key; V* value; TrieNode<V>* next; TrieNode<V>* children; }; c s f c f s h r h r

Search Search consumes one character at a time until Insert The end of the search key If value pointer exists, then the key is present in the map Or no child pointer exists in the TrieNode Insert Search until key is consumed but trie path already exists Set v pointer to value Search until trie path is NULL, extend path adding new TrieNodes and then add value at terminal V* search(char* k, TrieNode<V>* node) { while(*k != '\0' && node != NULL){ node = node->children[*k – 'a']; k++; } if(node){ return node->v; void insert(char* k, Value& v) { TrieNode<V>* node = root; while(*k != '\0' && node != NULL){ node = node->children[*k – 'a']; k++; } if(node){ node->v = new Value(v); } else { // create new nodes in trie // to extend path // updating root if trie is empty

SUFFIX TREES (TRIES)

Prefix Trees (Tries) Review What problem does a prefix tree solve Lookups of keys (and possible associated values) A prefix tree helps us match 1-of-n keys "He" "Help" "Hear" "Heap" "In" "Ill" Here is a slightly different problem: Given a large text string, T, can we find certain substrings or answer other queries about patterns in T A suffix tree (trie) can help here

Suffix Trie Slides http://www.cs.cmu.edu/~ckingsf/bioinfo-lectures/suffixtrees.pdf

Suffix Trie Wrap-Up How many nodes can a suffix trie have for text, T, with length |T|? |T|2 Can we do better? Can compress paths without branches into a single node Do we need a suffix trie to find substrings or answer certain queries? We could just take a string and search it for a certain query, q But it would be slow => O(|T|) and not O(|q|)

What Have We Learned [Key Point]: Think about all the data structures we've been learning? There is almost always a trade-off of memory vs. speed i.e. Space vs. time Most data structures just exploit different points on that time-space tradeoff continuum Think about searches in your project…Do we need a map? No, we could just search all items each time a keyword is provided But think how slow that would be So we build a data structure (i.e. a map) that replicates data and takes a lot of memory space… …so that we can find data faster