Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSC 212 – Data Structures Lecture 34: Strings and Pattern Matching.

Similar presentations


Presentation on theme: "CSC 212 – Data Structures Lecture 34: Strings and Pattern Matching."— Presentation transcript:

1 CSC 212 – Data Structures Lecture 34: Strings and Pattern Matching

2 Problem of the Day You drive a bus from Rotterdam to Delft. At the 1 st stop, 33 people get in. At the 2 nd stop, 7 more people get in, and 11 passengers leave. The 3 rd stop, sees 5 people leave and 2 get in. After one hour, the bus arrives in Delft. What is the name of the driver? Read the question: You are the driver!

3 Strings Algorithmically, String is just sequence of concatenated data:  “CSC212 STUDENTS IN DA HOUSE”  “I can’t believe this is a String!”  Java programs  HTML documents  Digitized image  DNA sequences

4 Strings In Java Java Strings are immutable  Java maintains a Map of text to String objects Each time String created, Map is checked  If text exists, Java uses the String object to which it is mapped  Otherwise, makes a new String & adds text and object to Map Happens “under the hood”  Make String work like a primitive type  Also makes it cheap to do lots of text processing

5 String Terminology String drawn from elements in an alphabet  ASCII or Unicode  Bits  Pixels  DNA bases Substring P[i... j] contains characters from P[i] through P[j] Substrings starting at rank 0 called a prefix Substrings ending with string’s last rank is suffix

6 Suffixes and Prefixes “I am the Lizard King!” PrefixesSuffixes I I I a I am … I am the Lizard Kin I am the Lizard King I am the Lizard King! ! g! ng! ing! … am the Lizard King! am the Lizard King! I am the Lizard King!

7 Pattern Matching Problem Given strings T & P, find first substring of T matching P  T is the “text”  P is the “pattern” Has many, many, many applications  Search engines  Database queries  Biological research

8 Brute-Force Approach Common method of solving problems Easy to develop  Often requires little coding  Needs little brain power to figure out Uses computer’s speed for analysis  Examines every possible option  Painfully slow and use lots of memory  Generally good only with small problems

9 Brute-Force Pattern Matching Compare P with every substrings in T, until  find substring of T equal to P -or-  reject all possible substrings of T If | P | = m and | T | = n, takes O(nm) time Worst-case:  T  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa  P  aaag  Common case for images & DNA data

10 Brute-Force Pattern Matching Algorithm BruteForceMatch(String T, String P) // Check if each rank of T starts a matching substring for i  0 to T.length() – P.length() // Compare substring starting at T[i] with P j  0 while j < P.length() && T.charAt(i + j) == P.charAt(j) j  j + 1 if j == P.length() return i // Return 1 st place in T we find P return -1 // No matching substring exists

11 Your Turn Get back into groups and do activity

12 Before Next Lecture… Keep up with your reading!  Cannot stress this enough Get ready for Lab Mastery Exam Start thinking about questions for Final


Download ppt "CSC 212 – Data Structures Lecture 34: Strings and Pattern Matching."

Similar presentations


Ads by Google