Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSC 213 Lecture 18: Tries. Announcements Quiz results are getting better Still not very good, however Average score on last quiz was 5.5 Every student.

Similar presentations


Presentation on theme: "CSC 213 Lecture 18: Tries. Announcements Quiz results are getting better Still not very good, however Average score on last quiz was 5.5 Every student."— Presentation transcript:

1 CSC 213 Lecture 18: Tries

2 Announcements Quiz results are getting better Still not very good, however Average score on last quiz was 5.5 Every student who read the book did better than students who had not done the reading

3 Preprocessing Strings Improve pattern matching by preprocessing patterns After preprocessing, KMP pattern matching takes O(n) time For large, unchanging, and often searched text, better to preprocess text and not pattern Trie is compact data structure representing set of strings E.g., all words in a text or identifiers in a program Supports pattern matching queries in O(m) time

4 Standard Tries (§ 11.3.1) Standard trie representing set of strings S is ordered tree such that: All nodes except root node labeled with a character Nodes order children alphabetically Combining characters along path from root to external node yields a string in S All strings in S encoded within standard trie

5 Standard Tries Example: standard trie for set of strings S = { bear, bid, buy, bell, sell }

6 Analysis of Standard Tries Use O(n) space Searches, insertions, and deletions take O(dm) time n – Total size of strings in S m – Size of string used within operation d – Size of the alphabet, , for S

7 Word Matching using Trie Each leaf stores locations where the word is within text seebear?sellstock! seebull?buystock! bidstock! a a hethebell?stop! bidstock! 01234567891011121314151617181920212223 2425262728293031323334353637383940414243444546 47484950515253545556575859606162636465666768 697071727374757677787980818283848586 ar 8788

8 Compressed Tries (§ 11.3.2) Internal nodes have at least 2 children Compresses “redundant” nodes in a standard trie

9 Compressed Tries

10 Compact Representation Compressed trie rely on auxiliary array of strings: Nodes store ranges of indices Use O(s) space, s is number of strings in array

11 Suffix Trie (§ 11.3.3) Compressed trie of all suffixes of string X

12 Analysis of Suffix Tries Compact representation for string X of size n in alphabet of size d Needs O(n) space Pattern matching queries take in O(dm) time Can be constructed in O(n) time

13 Greedy Method Technique Greedy method is paradigm built upon the following elements: Configurations: different choices, collections, or values possible Objective function: Assigns score to configurations We use this technique to solve problems asking us to find the best possible configuration E.g., Goal is to minimize or maximize score

14 Greedy Method Technique Works best for problems with greedy-choice property: Contain globally-optimal solution Can always find this solution using local improvements from any starting configuration

15 Text Compression (§ 11.4) Efficiently encode string X into smaller one Y Saves memory and bandwidth transmitting document Find text compression problems everywhere Internet transmissions Fax machines

16 Text Compression (§ 11.4) Good approach: Huffman encoding Compute frequency, f(c), for each character c High-frequency character  short code Low-frequency character  long code No code word is prefix for any other code Optimal encoding tree determines code words Uses a greedy method technique

17 Encoding Tree Example Code maps character to a binary code-word Prefix code is a code where no code-word is prefix of another Encoding tree represents prefix code External nodes store characters within alphabet Code word is path from root to character’s leaf  Use 0 when taking left child and 1 when taking right child a bc de abcde 000100111011

18 Huffman’s Algorithm Given a string X, construct prefix code minimizing the encoded string size Runs in time O(n + d log d) n is size of X d is number of distinct characters in X Priority queue used as auxiliary structure

19 Huffman’s Algorithm Algorithm HuffmanEncoding(String X) C  distinctCharacters(X) computeFrequencies(C, X) Q  new Heap() for each c  C T  new Tree(c) Q.insert(getFrequency(c), T) while Q.size() > 1 f 1  Q.minKey() T 1  Q.removeMin() f 2  Q.minKey() T 2  Q.removeMin() T  join(T 1, T 2 ) Q.insert(f 1 + f 2, T) return Q.removeMin()

20 Example abcdr 52112 X = abracadabra Frequencies cardb 52112 cardb 2 522 cabdr 2 5 4 cabdr 2 5 4 6 c a bdr 24 6 11

21 Announcements #2 Midterms were good, however Median score was 86; mean score was 84 Will hand back at end of lecture


Download ppt "CSC 213 Lecture 18: Tries. Announcements Quiz results are getting better Still not very good, however Average score on last quiz was 5.5 Every student."

Similar presentations


Ads by Google