# Graph and String Matching. 3 -2 String Matching Problem Given a text string T of length n and a pattern string P of length m, the exact string matching.

## Presentation on theme: "Graph and String Matching. 3 -2 String Matching Problem Given a text string T of length n and a pattern string P of length m, the exact string matching."— Presentation transcript:

Graph and String Matching

3 -2 String Matching Problem Given a text string T of length n and a pattern string P of length m, the exact string matching problem is to find all occurrences of P in T. Example: T=“ AGCTTGA ” P=“GCT” Applications: –Searching keywords in a file –Searching engines –Database searching

3 -3 Terminologies S=“ AGCTTGA ” |S|=7, length of S Substring: S i,j =S i S i+1 …S j –Example: S 2,4 =“GCT” Subsequence of S: deleting zero or more characters from S –“ACT” and “GCTT” are subsquences. Prefix of S: S 1,k –“AGCT” is a prefix of S. Suffix of S: S h,|S| –“CTTGA” is a suffix of S.

String Matching Given a pattern P[1..m] and a text T[1..n], find all occurrences of P in T. Both P and T belong to  *. P occurs with shift s (beginning at s+1): P[1]=T[s+1], P[2]=T[s+2],…,P[m]=T[s+m]. If so, call s is a valid shift, otherwise, an invalid shift. Note: one occurrence begins within another one: P=abab, T=abcabababbc, P occurs at s=3 and s=5.

An example of string matching

Naïve string matching Running time: O((n-m+1)m).

3 -7 A Brute-Force Algorithm Time: O(mn) where m=|P| and n=|T|.

8 Rabin-Karp The Rabin-Karp string searching algorithm calculates a hash value for the pattern, and for each M-character subsequence of text to be compared. If the hash values are unequal, the algorithm will calculate the hash value for next M-character sequence. If the hash values are equal, the algorithm will do a Brute Force comparison between the pattern and the M-character sequence. In this way, there is only one comparison per text subsequence, and Brute Force is only needed when hash values match. Perhaps an example will clarify some things...

9 Rabin-Karp Example Hash value of “AAAAA” is 37 Hash value of “AAAAH” is 100

10 Rabin-Karp Algorithm pattern is M characters long hash_p=hash value of pattern hash_t=hash value of first M letters in body of text do if (hash_p == hash_t) brute force comparison of pattern and selected section of text hash_t= hash value of next section of text, one character over while (end of text or brute force comparison == true)

08-07-200611 What is a graph? A set of vertices and edges –Directed/Undirected –Weighted/Unweighted –Cyclic/Acyclic vertex edge

08-07-200612 Representation of Graphs Adjacency Matrix –A V x V array, with matrix[i][j] storing whether there is an edge between the i th vertex and the j th vertex Adjacency Linked List –One linked list per vertex, each storing directly reachable vertices Edge List

08-07-200613 Representation of Graphs Adjacency Matrix Adjacency Linked List Edge List Memory Storage O(V 2 )O(V+E) Check whether (u,v) is an edge O(1)O(deg(u)) Find all adjacent vertices of a vertex u O(V)O(deg(u)) deg(u): the number of edges connecting vertex u

08-07-200614 Graph Searching Why do we do graph searching? What do we search for? What information can we find from graph searching? How do we search the graph? Do we need to visit all vertices? In what order?

08-07-200615 Depth-First Search (DFS) Strategy: Go as far as you can (if you have not visit there), otherwise, go back and try another way

08-07-200616 Implementation DFS (vertex u) { mark u as visited for each vertex v directly reachable from u if v is unvisited DFS (v) } Initially all vertices are marked as unvisited

08-07-200617 Breadth-First Search (BFS) Instead of going as far as possible, BFS tries to search all paths. BFS makes use of a queue to store visited (but not dead) vertices, expanding the path from the earliest visited vertices.

08-07-200618 1 4 3 2 5 6 Simulation of BFS Queue: 143 5 26

08-07-200619 Implementation while queue Q not empty dequeue the first vertex u from Q for each vertex v directly reachable from u if v is unvisited enqueue v to Q mark v as visited Initially all vertices except the start vertex are marked as unvisited and the queue contains the start vertex only

Download ppt "Graph and String Matching. 3 -2 String Matching Problem Given a text string T of length n and a pattern string P of length m, the exact string matching."

Similar presentations