Download presentation

Presentation is loading. Please wait.

Published byDario Slee Modified about 1 year ago

1
Analysis of Real World NP-Complete Graph Problem: DCJ Median Algorithm to Find Ancestor of Genome of Three Zhaoming Yin School of CSE, Georgia Tech

2
Foundamentals Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences....

3
Foundamentals In 1980s Jeffrey Palmer studied evolution of plant organelles by comparing mitochondrial genomes of the cabbage and turnip, 99% similarity between genes, These surprisingly identical gene sequences differed in gene order, This study helped pave the way to analyzing genome rearrangements in molecular evolution –6 – – Inversion: Transposition: Inverted Transposition:

4
Foundamentals Maximal Parsimony Phylogeny is to optimize each ancestral node of an unrooted phylogeny in terms of its three or more immediate neighbours, modern or ancestral, and to iterate across the tree until convergence of the objective function (to a local optimum) at all nodes.

5
Break Point Graph /-60/+11/-12/+23/-24/+35/-36/+47/-48/+59/-510/ /+11/-12/+23/ /-10/+12/+23/

6
MBG/0-Matching

7
Subgraph/Decomposer Subgraph H-crossing

8
Adequate Subgraph Definition: In an MBG for a set of genomes G, a connected subgraph H of size m is an adequate subgraph if c max (H) ≥ 1/2mN G ; it is strongly adequate if c max (H) >1/2mN G. (m is the size of node in the subgraph, N G is the size of genome, which is 3 for the median of three problem). Property: A Adequate Subgraph is simple, if it does not contain another adequate subgraph. Lemma: A Adequate Subgraph is a decomposer.

9
Adequate Subgraph

10
Algorithm: AS1() major set for each v do if v[0]=v[1] or v[0]=v[2] or v[1]=v[2] these two points are AS; the edge conncecting them is major set; endif endfor

11
Adequate Subgraph √√

12
Algorithm: AS2() for each color c do for each v do if v[c1][c]=v[c][c2](1) or v[c2][c]=v[c][c1] (2) or v[c2][c1]=v[c][c2] (3) or v[c1][c2]=v[c][c1] v,v[c],v[c1],v[c2] are AS; (1), major set is (v,v[c1]) and (v[c],v[c2]) or (2), major set is (v,v[c2]) and (v[c],v[c1]) or (3), major set is (v,v[c]) and (v[c1],v[c][c2]) or (4), major set is (v,v[c]) and (v[c2],v[c][c1]) endif endfor c c1 c2 c1 c2 c (1)(2) c1 c2 c (1)(2) c c2c1

13
Algorithm: AS2() for each color c do for each v do if v[c1][c]=v[c][c1] and (v[c1]=v[c][c2] || v[c1]=v[c][c2) (1) or v[c1][c2]=v[c][c1] and (v[c1]=v[c][c2] || v[c1]=v[c][c2) (2) v,v[c],v[c1],v[c2] are AS; (1), major set is (v,v[c1]) and (v[c],v[c2]) or (2), major set is (v,v[c2]) and (v[c],v[c1]) endif endfor c c c2 (1) (2) c1 c2c1 c c2

14
Algorithm: AS2() for each color c do for each v do if v[c1][c]=v[c][c1] and (v[c2][c]=v[c][c2] and v[c1]!=v[c][c2] and v[c2] !v[c][c1] v,v[c],v[c1],v[c2] are AS; (1), major set is (v,v[c1]) and (v[c],v[c2]) endif endfor c c1 c2

15
Algorithm: AS2() for each color c do for each v do if v[c1][c]=v[c][c1] and type three is not find v,v[c],v[c1],v[c2] are AS; (1), major set is (v,v[c1]) and (v[c],v[c2]) and (v,v[c]) and (v[c1],v[c][c1]) endif endfor c1 c In this case, there are two major sets

16
Adequate Subgraph √√√√√√

17
Algorithm: AS4()--type core p1p2 po1 po0 po2 c0 c1 c2 po11 po22

18
Adequate Subgraph √√√√√√ √√ √ √

19
Algorithm: AS4()

20
Adequate Subgraph √√√√√√ √√ √ √√ √ √

21
Algorithm: AS4()

22

23
Adequate Subgraph √√√√√√ √√ √ √√ √ √√ √ √ √√

24
Algorithm: Shrink()

25

26
Algorithm: Shrink()

27
Branch and Bound Algorithm

28
Branch and Bound Algorithm(1) If there is no brach that has the current upper bound, decrease it. No element in the memory, load others from disk.

29
Branch and Bound Algorithm(2) Get a intermediate sub- graph, and check if it could be trimed, or it is the final solution. If too much elems in the memory store them in the disk.

30
Upperbound and Lowerbound-Upperbound DCJ distance between genomes obey triangular inequality. So: Given Three genomes G1 G2 G3, the median genome will have the distance between them: Because the distance is defined by: therefore, the upperbound for circle number is:

31
Upperbound and Lowerbound-Upperbound DCJ distance between genomes obey triangular inequality. So: Given Three genomes G1 G2 G3, the median genome will have the distance between them: Because the distance is defined by: therefore, the upperbound for circle number is:

32
Best First Search Because best first search can ensure that the searching space is minimal. However, it needs much space to store the foot print. Which makes the branch and bound algorithm an I/O bound algorithm kk

33
Reference [1] Andrew Wei Xu and David Sankoff, Decompositions of multiple breakpoint graphs and rapid exact solutions to the median problem., K.A. Crandal l and J. Lagergren (Eds.): Proceedings of the Workshop on Algorithms in Bioinformatics, WABI 2008, Lecture Notes in Bioinformatics 5251,Springer. [2] Yancopoulos, S., Attie, O., Friedberg, R.: E?cient sorting of genomic permutations by translocation, inversion and block interchange. Bioinform. 21, 3340ĺC3346 (2005) [3] Andrew Wei Xu, A Fast and Exact Algorithm for the Median of three Problem: a Graph Decomposition Approach., Journal of computational biology, 2009, 16(10), 1-13.

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google