Download presentation
Presentation is loading. Please wait.
1
The Application of the A* Algorithm to the Sequence Alignment Problem Adviser: Prof. R. C. T. Lee Speaker: J. R. Hsu
2
National Chi Nun University 2 Abstract We are going to introduce the application of the A* algorithm to the sequence alignment problem. This A* algorithm is based upon the principle of Dijkstra’s algorithm. We will also introduce a skill to reduce the time complexity of this A* algorithm.
3
National Chi Nun University 3 ? ? ? The Sequence Alignment Problem S 1 = " AATCCGTG " S 2 = " AACCGTTG " S 1 = " AATCCGTG-------- " S 2 = " --------AACCGTTG " S 1 = " AATCCGTG " S 2 = " AACCGTTG " S 1 = " AATCCG-TG " S 2 = " AA-CCGTTG "
4
National Chi Nun University 4 The Sequence Alignment Problem S 1 = " AATCCGTG " S 2 = " AACCGTTG " S 1 = " AATCCGTG-------- " S 2 = " --------AACCGTTG " S 1 = " AATCCGTG " S 2 = " AACCGTTG " S 1 = " AATCCG-TG " S 2 = " AA-CCGTTG " -1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1 = -16 +1+1-1+1-1-1+1+1 = 2 +1+1-1+1+1+1-1+1+1 = 5 The score of aligning the same characters is 1, while the score of aligning the different characters is -1.
5
National Chi Nun University 5 Levenshtein Distance Problem S 1 = " AATCCGTG " S 2 = " AACCGTTG " S 1 = " AATCCGTG-------- " S 2 = " --------AACCGTTG " S 1 = " AATCCGTG " S 2 = " AACCGTTG " S 1 = " AATCCG-TG " S 2 = " AA-CCGTTG " +1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1 = 16 +0+0+1+0+1+1+0+0 = 3 +0+0+1+0+0+0+1+0+0 = 2 The score of aligning the same characters is 0, while the score of aligning the different characters is 1.
6
National Chi Nun University 6 The Dynamic Programming Approach to Levenshtein Distance Problem Let S 1 ="M 1 M 2 …M m " and S 2 ="N 1 N 2 …N n " denote two sequences. Let D(i,j) denote the Levenshtein distance between "M 1 M 2 …M i " and "N 1 N 2 …N j ". Recursively compute D(m,n) according to the following formula.
7
National Chi Nun University 7 The Dynamic Programming Approach to Levenshtein Distance Problem AATTCCGG A A T T C C G G 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 0 1234567 1 2 3 4 5 6 7 0 123456 1 2 3 4 5 6 123345 1 2 3 4 5 12345 1 2 3 4 2345 12 34 2 3 1 2 3 22 2 S 1 = " AATCCGTG " S 2 = " AACCGTTG " S 1 = " AATCCG-TG " S 2 = " AA-CCGTTG " +0+0+1+0+0+0+1+0+0 = 2
8
National Chi Nun University 8 Viewing Levenshtein Distance Problem as the Shortest Path Problem S 1 = " AATCCGTG " S 2 = " AACCGTTG " S 1 = " AATCCG-TG " S 2 = " AA-CCGTTG " +0+0+1+0+0+0+1+0+0 = 2 is associated with length 1, while is associated with length 0.
9
National Chi Nun University 9 Dijkstra ’ s algorithm to Levenshtein Distance Problem Because we can view Levenshtein distance problem as the shortest path problem, we can use Dijkstra’s algorithm to solve Levenshtein distance problem.
10
National Chi Nun University 10 Dijkstra ’ s algorithm to Levenshtein Distance Problem S 1 = " AATCCGTG " S 2 = " AACCGTTG " S 1 = " AATCCG-TG " S 2 = " AA-CCGTTG " +0+0+1+0+0+0+1+0+0 = 2 is associated with length 1, while is associated with length 0. AATTCCGG A A T T C C G G 1 1 2 2 12 1 2 12 2 12 2 12 2 2 21 22 0 0 0 1 1 1 12 2 2
11
National Chi Nun University 11 The A* algorithm to Levenshtein Distance Problem We improve the foregoing Dijkstra’s algorithm with the philosophy behind the A* algorithm. For every node v, instead of only maintaining the length of the shortest path from the source node to v, we further maintain the lower bound of the length of the shortest path from the source node, through v, to the target node.
12
National Chi Nun University 12 The A* algorithm to Levenshtein Distance Problem S 1 = " AATCCGTG " S 2 = " AACCGTTG " S 1 = " AATCCG-TG " S 2 = " AA-CCGTTG " +0+0+1+0+0+0+1+0+0 = 2 is associated with length 1, while is associated with length 0. AATTCCGG A A T T C C G G 2 22 22 1 1 2 2 0 0 0 2 2 2 22 2 2
13
National Chi Nun University 13 The Analysis of the Time Complexity of the A* algorithm to the Shortest Path Problem Given: a graph G with n nodes. We must visit n nodes in the worst case. During every visiting the node v, we must maintain n-1 adjacent nodes in the worst case. After every visiting the node v, we must choose the next node to visit from n-1 nodes in the worst case. The time complexity of this A* algorithm is O(n 2 ).
14
National Chi Nun University 14 Too High the Time Complexity Given a graph G with n nodes, the time complexity of using the A* algorithm to solve the shortest path problem is O(n 2 ). Following this analysis, given two sequences S 1 ="M 1 M 2 …M m " and S 2 ="N 1 N 2 …N n ", the time complexity of using the A* algorithm to solve Leveshtein distance problem is O(m 2 n 2 ). This is instead more time-consuming than using the dynamic programming approach to solve Leveshtein distance problem.
15
National Chi Nun University 15 The Analysis of the Time Complexity of the A* algorithm to Levenshtein Distance Problem Given: two sequences S 1 ="M 1 M 2 …M m " and S 2 ="N 1 N 2 …N n ". We must visit (m+1) (n+1) nodes in the worst case. During every visiting the node v, we must maintain 3 adjacent nodes in the worst case. This spends the constant time. After every visiting the node v, instead of choosing the next node, we pick a node to visit and this node guarantees to be the feasible node. This spends the constant time. The time complexity of this A* algorithm is O(mn).
16
National Chi Nun University 16 The Pick The spirit of this skill is to partition the set of nodes into subsets such that nodes in the same subset have the same priorities to be visited. By this spirit, when we must choose the next node from a subset, we do not need to care which one we should choose because all nodes in this subset have the same priorities to be visited.
17
National Chi Nun University 17 The Pick AATTCCGG A A T T C C G G 2 22 22 1 2 0 2 2 2 22 2 201345678 0 1 2 Dynamic double linked lists 0 2 Static array
18
National Chi Nun University 18 Conclusion When attempting solving the sequence alignment problem, this A* algorithm has less computations than the dynamic programming approach.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.