Download presentation
Presentation is loading. Please wait.
Published byShana Davis Modified over 9 years ago
1
A * Search A* (pronounced "A star") is a best first, graph search algorithm that finds the least-cost path from a given initial node to one goal node out of one or more possible goals.
2
Definitions A * uses a distance-plus-estimate heuristic function denoted by f(x) to determine the order in which the search visits nodes in the tree induced by the search. The distance-plus-estimate heuristic is a sum of two functions: the path-cost function denoted g(x) from the start node to the current node and an admissible "heuristic estimate" of the distance to the goal denoted h(x). an admissible h(x) must not overestimate the distance to the goal. For an application like routing, h(x) might represent the straight-line distance to the goal, since that is physically the smallest possible distance between any two points (or nodes for that matter).
3
An A * algorithm for Edit Distance Edit Distance D E (X,Y) measures how close string X is to string Y. D E (X,Y) is the cost of the minimum cost transformation t : X t Y where t is a sequence of operations (insertion, equal substitution, unequal substitution, and deletion). The cost of t is the sum of the operation costs where each operation costs 1 except for equal substitution which costs 0. ABBAC BAACA The cost of this transformation is 3 which happens to be minimal.
4
Dynamic programming Solution (an O(mn) solution) Decomposition : Last Operation Delete, Substitute, or Insert Atomic Problems : X prefix or Y prefix empty Table : Rows for 0.. M for X prefix characters, Columns 0.. N for Y prefix characters Table Entry : D E (Xi, Yj) Composition : = cost(Substitution) = 1 if x i != y j and 0 otherwise. D E (X i,Y j ) = min{ D E (X i-1,Y j ) + 1, D E (X l-1,Y j-1 ) + , D E (X i,Y j-1 ) + 1 }
5
Edit Distance as a Shortest Path Problem Define a transformation graph G XY = (V,E) as follows: The set V of nodes (vertices) = {0.. M} {0.. N} where node n pq represents the state of transforming a p length prefix of X into a q length prefix of Y. The set E of edges represent the operations of deletion, connecting node n p,q to n p+1,q with length 1 substitution, connecting node n p,q to n p+1,q+1 with length 0 or 1 depending on whether X p+1 = Y q+1 or not insertion, connecting node n p,q to n p,q+1 with length 1 The start and goal nodes are n 0,0 and n M,N
6
Introduction Edit Distance – Based on Single Character Edit Operations Insertion : a Inserts an “a” into target without effecting the source; cost = 1 Equal Substitution : a a Substitutes an “a” into target for an “a” in source; cost = 0 Unequal Substitution : a b Substitutes a “b” into target for an “a” in source; cost = 1 Deletion : a Deletes an “a” from source without effecting the target; cost = 1
7
Example of a Transformation Graph The vertices of T correspond to prefix pairs of X and Y. The edges of T are directed and correspond to the single character edit operations which would transform one prefix pair into another. Example of a Transformation Graph X = abbab Y = bbaba
8
D E (X,Y) = cost of shortest path start vertex to goal vertex = 2
9
A frequency based Lower Bound function h Let X i be the suffix of X beginning with the ith character and Y j be similarly defined. If X = abbab and Y = bbaba X 2 = bbab and Y 2 = baba Excess(X 2,Y 2,a) = 0 Def(X 2,Y 2,a) =1 Excess(X 2,Y 2,b) = 0 Def(X 2,Y 2,b) =0 Excess(X 2,Y 2 ) is sum of excesses over alphabet and Def(X 2,Y 2 ) is sum of deficiencies. h( X 2,Y 2 ) = max{Excess(X 2,Y 2 ),Def(X 2,Y 2 )} is a lower bound to the length of the shortest path from vertex to goal.
10
Classification and Strings
11
Applications of Edit Distance DNA analysis Classification of heart beats. Handwriting recognition. Spelling correction. Error correction of variable length codes. Speech recognition.
12
Discrete Directional Alphabet
13
Mapping EKG’s to Strings
14
Classification as Path Problem LB(Start,Goal-1) = 0 LB(Start,Goal-2) = 3
15
Lower Bounds to Edit Distance Lower Bound Based on Frequency Let f a (X) and f a (Y) be the frequencies of a in X and Y. Define Ex(a,X,Y) = f a (X) – f a (Y) if f a (X) > f a (Y) else 0 Define Def(a,X,Y) = f a (Y) – f a (X) if f a (Y) > f a (X) else 0 For any a, both Ex(a,X,Y) and Def(a,X,Y) D(X,Y) Ex(a,X,Y) + Ex(b,X,Y) D(X,Y). max { a Ex(a,X,Y), a Def(a,X,Y) } D(X,Y) LB(i,j,X,Y) computed for the ith suffix of X and the jth suffix of Y is a lower bound to the remaining distance after having computed the edit distance for the ith and jth prefixes of X and Y.
16
Lower Bounds to Edit Distance Lower Bound Based on Frequency Since X has a deficiency of 1 b with Y1 as a target, 1 is a lower bound to D(X,Y1). Since X has a deficiency of 2 a’s with Y2 as a target and an excess of 1 b, 2 is a lower bound to D(X,Y2). Since X has a deficiency of 3 b’s with Y3 as a target and an excess of 2 a’s, 3 is a lower bound to D(X,Y3). Consequently the initial vertices of the 3 transformation graphs are organized into a priority queue as shown to the left.
17
A * Search for Closest Target f = h + g Keeping track of last operation since insertion cannot be followed by deletion and vise versa
18
A * Search for Closest Target Finds distance of 1 to Y1 in 3 steps. Y1 must be a closest goal since bnd + dist is minimized.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.