Parallel #2 Paper – Phylogeny and Branch and Bound Algorithms George McGinn

Parallel #2 Paper – Phylogeny and Branch and Bound Algorithms George McGinn (georgemcginn@yahoo.com)

Tree Building (Phylogengy) PHYLIP (Phylogeny Inference Package): PHYLIP is a free package of programs for inferring phylogenies. One of the most popular, it is currently available for all the major OSes and features over a dozen different algorithms for coming up with the trees. I choose the Penny algorithm, and look into possible parallel implementations of it.

Finding the best Tree Algorithms 3 types of solutions: Exhaustive- Search every tree Heuristic-use some algorithm to make a good (possibly best) tree Most Parsimonious-prove that a given tree is the best tree, (hopefully) without searching every tree

Exhaustive Number of elements= N# trees (N) T(1) = 1 T(2) = 1 T(3) = T(2) * 3 T(4) = T(4) * 4 T(N) = (N-1) * N Positioning Nodes:Simple Factorial Big O = n! n! = (n!)^2 = way too big! 10 nodes = 6,584,094,720,000 10 nodes (Eliminating mirror images) – still about 35 million

Heuristic Search: Types of PHYLIP searches:Neighbor, Factor, GENDIST Algorithm to find the “best” tree – the tree is dependent on the order in which they are received so Jumble options are made to see the different trees possible. Very fast, but inexact. Returns one tree (generally).

Most Parsimonious Trees Penny (DNAPenny) – Uses Branch and Bound to come up with the optimal solution(s). Also CLIQUE searches.

Branch and Bound sidetrack Traveling Salesman example: line up all the possible solutions, fully calculate one, and then attempt to all the rest (Depth First Search). When you solution must be worse, disregard that node or subtree. If better then previous best one, then that one becomes the new best solution. If equal, save to list. Does not HAVE to try all possibles: efficiency depends very much on input order and data and may actually calc them all. Note that if all subtrees need to be explored, this can actually be slower due to algorithm overhead!

Back to Penny Add all nodes in order, then backtrack. Make tree of first two species: (A,B) Add C in first place: ((A,B),C) Add D in first place: (((A,D),B),C) Add D in second place: ((A,(B,D)),C) Add D in third place: (((A,B),D),C) Add D in fourth place: ((A,B),(C,D)) Add D in fifth place: (((A,B),C),D) Add C in second place: ((A,C),B) Add D in first place: (((A,D),C),B) Add D in second place: ((A,(C,D)),B) Add D in third place: (((A,C),D),B) Add D in fourth place: ((A,C),(B,D)) Add D in fifth place: (((A,C),B),D) Add C in third place: (A,(B,C)) Add D in first place: ((A,D),(B,C)) Add D in second place: (A,((B,D),C)) Add D in third place: (A,(B,(C,D))) Add D in fourth place: (A,((B,C),D)) Add D in fifth place: ((A,(B,C)),D) And so forth!

Parallelization of the Branch and Bound Algorithm on Distributed memory machines Problem groups should be in large enough blocks and are uniform in size so a single integer can determine which block is currently being examined. Each processor initially takes a certain range, and has the next block integer set to the number of processors. When a processor is done, it broadcasts that it is taking the next block (so all of the other processors up their counter), and then starts to process it. On really large networks, this probably would best be modified to remove all the communication overhead by running things in lockstep. This is not optimal as some scenarios will almost immediately remove their trees.

Parallelization of the Branch and Bound Algorithm on Distributed memory machines, pt 2 So in the set up above, the possible problem groups might be the different variations for where the C was added (3 variations). Problems with this: On distributed memory systems, all the messages for taking new blocks might overload the network. Potentially, the message that indicates a new best solution has been found might only transmit at the end of a block to insure that these messages do not also get out of hand. This may cause extra paths to be traversed that otherwise would be skipped.

Links and Fun: Phylip (Phylogeny Inference Package): http://www.molbiol.ox.ac.uk/documentation/phylip/index.html Penny Algorithms: http://www.molbiol.ox.ac.uk/documentation/phylip/penny.html A parallel synchronized branch and bound algorithm: http://www.epfl.ch/SIC/SA/publications/SCR94/6-94-page15.html (EPFL Supercomputing Review - n. 6 - nov. 94) This was my originally intended paper – however it ended up being too dense for me to sufficiently present it! Branch and Bound intro: http://www-fp.mcs.anl.gov/otc/Guide/OptWeb/discrete/integerprog/section2_1_1.html Original idea for implementing B&B with Phylogeny: Hendy, M. D., and D. Penny. 1982. Branch and bound algorithms to determine minimal evolutionary trees. Mathematical Biosciences 59: 277-290

Parallel #2 Paper – Phylogeny and Branch and Bound Algorithms George McGinn

Similar presentations

Presentation on theme: "Parallel #2 Paper – Phylogeny and Branch and Bound Algorithms George McGinn"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallel #2 Paper – Phylogeny and Branch and Bound Algorithms George McGinn

Similar presentations

Presentation on theme: "Parallel #2 Paper – Phylogeny and Branch and Bound Algorithms George McGinn"— Presentation transcript:

Similar presentations

About project

Feedback