Presentation is loading. Please wait.

Presentation is loading. Please wait.

Graph and String Matching. 3 -2 String Matching Problem Given a text string T of length n and a pattern string P of length m, the exact string matching.

Similar presentations


Presentation on theme: "Graph and String Matching. 3 -2 String Matching Problem Given a text string T of length n and a pattern string P of length m, the exact string matching."— Presentation transcript:

1 Graph and String Matching

2 3 -2 String Matching Problem Given a text string T of length n and a pattern string P of length m, the exact string matching problem is to find all occurrences of P in T. Example: T=“ AGCTTGA ” P=“GCT” Applications: –Searching keywords in a file –Searching engines –Database searching

3 3 -3 Terminologies S=“ AGCTTGA ” |S|=7, length of S Substring: S i,j =S i S i+1 …S j –Example: S 2,4 =“GCT” Subsequence of S: deleting zero or more characters from S –“ACT” and “GCTT” are subsquences. Prefix of S: S 1,k –“AGCT” is a prefix of S. Suffix of S: S h,|S| –“CTTGA” is a suffix of S.

4 String Matching Given a pattern P[1..m] and a text T[1..n], find all occurrences of P in T. Both P and T belong to  *. P occurs with shift s (beginning at s+1): P[1]=T[s+1], P[2]=T[s+2],…,P[m]=T[s+m]. If so, call s is a valid shift, otherwise, an invalid shift. Note: one occurrence begins within another one: P=abab, T=abcabababbc, P occurs at s=3 and s=5.

5 An example of string matching

6 Naïve string matching Running time: O((n-m+1)m).

7 3 -7 A Brute-Force Algorithm Time: O(mn) where m=|P| and n=|T|.

8 8 Rabin-Karp The Rabin-Karp string searching algorithm calculates a hash value for the pattern, and for each M-character subsequence of text to be compared. If the hash values are unequal, the algorithm will calculate the hash value for next M-character sequence. If the hash values are equal, the algorithm will do a Brute Force comparison between the pattern and the M-character sequence. In this way, there is only one comparison per text subsequence, and Brute Force is only needed when hash values match. Perhaps an example will clarify some things...

9 9 Rabin-Karp Example Hash value of “AAAAA” is 37 Hash value of “AAAAH” is 100

10 10 Rabin-Karp Algorithm pattern is M characters long hash_p=hash value of pattern hash_t=hash value of first M letters in body of text do if (hash_p == hash_t) brute force comparison of pattern and selected section of text hash_t= hash value of next section of text, one character over while (end of text or brute force comparison == true)

11 What is a graph? A set of vertices and edges –Directed/Undirected –Weighted/Unweighted –Cyclic/Acyclic vertex edge

12 Representation of Graphs Adjacency Matrix –A V x V array, with matrix[i][j] storing whether there is an edge between the i th vertex and the j th vertex Adjacency Linked List –One linked list per vertex, each storing directly reachable vertices Edge List

13 Representation of Graphs Adjacency Matrix Adjacency Linked List Edge List Memory Storage O(V 2 )O(V+E) Check whether (u,v) is an edge O(1)O(deg(u)) Find all adjacent vertices of a vertex u O(V)O(deg(u)) deg(u): the number of edges connecting vertex u

14 Graph Searching Why do we do graph searching? What do we search for? What information can we find from graph searching? How do we search the graph? Do we need to visit all vertices? In what order?

15 Depth-First Search (DFS) Strategy: Go as far as you can (if you have not visit there), otherwise, go back and try another way

16 Implementation DFS (vertex u) { mark u as visited for each vertex v directly reachable from u if v is unvisited DFS (v) } Initially all vertices are marked as unvisited

17 Breadth-First Search (BFS) Instead of going as far as possible, BFS tries to search all paths. BFS makes use of a queue to store visited (but not dead) vertices, expanding the path from the earliest visited vertices.

18 Simulation of BFS Queue:

19 Implementation while queue Q not empty dequeue the first vertex u from Q for each vertex v directly reachable from u if v is unvisited enqueue v to Q mark v as visited Initially all vertices except the start vertex are marked as unvisited and the queue contains the start vertex only


Download ppt "Graph and String Matching. 3 -2 String Matching Problem Given a text string T of length n and a pattern string P of length m, the exact string matching."

Similar presentations


Ads by Google