1. 2 Rooting the tree and giving length to branches.

Slides:



Advertisements
Similar presentations
Heuristic Search techniques
Advertisements

Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Traveling Salesperson Problem
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
Based on lectures by C-B Stewart, and by Tal Pupko Phylogenetic Analysis based on two talks, by Caro-Beth Stewart, Ph.D. Department of Biological Sciences.
Planning under Uncertainty
“Inferring Phylogenies” Joseph Felsenstein Excellent reference
CPSC 322, Lecture 9Slide 1 Search: Advanced Topics Computer Science cpsc322, Lecture 9 (Textbook Chpt 3.6) January, 22, 2010.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
Heuristic search heuristic search attempts to find the best tree, without looking at all possible trees.
MAE 552 – Heuristic Optimization Lecture 27 April 3, 2002
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
The Tree of Life From Ernst Haeckel, 1891.
The Theory of NP-Completeness
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
Analysis of Algorithms CS 477/677
Review Best-first search uses an evaluation function f(n) to select the next node for expansion. Greedy best-first search uses f(n) = h(n). Greedy best.
Introduction to Bioinformatics Molecular Phylogeny Lesson 5.
Phylogenetic trees. ChimpHumanGorilla HumanChimpGorilla = ChimpGorillaHuman == GorillaChimp Trees.
Phylogeny reconstruction BNFO 602 Roshan. Simulation studies.
Probabilistic methods for phylogenetic trees (Part 2)
Building Phylogenies Parsimony 2.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
TREES. ChimpHumanGorilla HumanChimpGorilla = ChimpGorillaHuman == GorillaChimp Trees.
Lecture 8 – Searching Tree Space. The Search Tree.
Ch. 11: Optimization and Search Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 some slides from Stephen Marsland, some images.
1.1 Chapter 1: Introduction What is the course all about? Problems, instances and algorithms Running time v.s. computational complexity General description.
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Vilalta&Eick: Informed Search Informed Search and Exploration Search Strategies Heuristic Functions Local Search Algorithms Vilalta&Eick: Informed Search.
Lesson 1.9 Probability Objective: Solve probability problems.
Computer Science Research for The Tree of Life Tandy Warnow Department of Computer Sciences University of Texas at Austin.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
BINF6201/8201 Molecular phylogenetic methods
Dijkstra’s Algorithm. Announcements Assignment #2 Due Tonight Exams Graded Assignment #3 Posted.
Algorithms  Al-Khwarizmi, arab mathematician, 8 th century  Wrote a book: al-kitab… from which the word Algebra comes  Oldest algorithm: Euclidian algorithm.
Computer Science CPSC 322 Lecture 9 (Ch , 3.7.6) Slide 1.
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
Cladogram construction Thanks to Leandro Gaetano.
Applications of Dynamic Programming and Heuristics to the Traveling Salesman Problem ERIC SALMON & JOSEPH SEWELL.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
CSE373: Data Structures & Algorithms Lecture 22: The P vs. NP question, NP-Completeness Lauren Milne Summer 2015.
Measuring complexity Section 7.1 Giorgi Japaridze Theory of Computability.
1 Branch and Bound Searching Strategies Updated: 12/27/2010.
For Wednesday Read chapter 6, sections 1-3 Homework: –Chapter 4, exercise 1.
For Wednesday Read chapter 5, sections 1-4 Homework: –Chapter 3, exercise 23. Then do the exercise again, but use greedy heuristic search instead of A*
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
SNU OOPSLA Lab. 1 Great Ideas of CS with Java Part 1 WWW & Computer programming in the language Java Ch 1: The World Wide Web Ch 2: Watch out: Here comes.
Optimization Problems
Subtree Prune Regraft & Horizontal Gene Transfer or Recombination.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.
Chapter AGB. Today’s material Maximum Parsimony Fixed tree versions (solvable in polynomial time using dynamic programming) Optimal tree search.
BackTracking CS255.
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
For Monday Chapter 6 Homework: Chapter 3, exercise 7.
The Tree of Life From Ernst Haeckel, 1891.
BNFO 602 Phylogenetics Usman Roshan.
CS 581 Tandy Warnow.
Lecture 8 – Searching Tree Space
Lecture 7 – Algorithmic Approaches
CS 394C: Computational Biology Algorithms
Md. Tanveer Anwar University of Arkansas
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning
Presentation transcript:

1

2 Rooting the tree and giving length to branches

3 Rooted vs. unrooted trees

4 The position of the root does not affect the MP score. Rooted vs. Unrooted. Exercise: Draw all alternative rooting of the MP tree. Evaluate 1 of them, and show that the MP score does not change.

5 s1s4s3s2s5 Gene number 1, Option number More intuition why rooting does not change score. The change will always be on the same branch, no matter where the root is positioned… 1

6 How can we root the tree – we want rooted trees!

7

8

9 Gorilla gorilla (Gorilla) Homo sapiens (human) Pan troglodytes (Chimpanzee) Gallus gallus (chicken)

10 Evaluate all 3 possible UNROOTED trees: Human Chimp Chicken Gorilla Human Gorilla Chimp Chicken Human Chicken Chimp Gorilla MP tree

11

12 HOW MANY TREES

13 How many rooted trees ab abcbaccab N=3, TR(3) = 3 bcd a cbd a dbc a acd b cad b TR = “TREE ROOTED” N=2, TR(2) = 1 dac b abd c bad c dab c abc dbac d cab d bcd a cbd a dbc a N=4, TR(4) = 15

14 How many rooted trees ab cab TR = “TREE ROOTED” 2 branches. 3 possible places to add “c” bac d dbc a c c c 4 branches. 5 possible places to add “d” 6 branches. 7 possible places to add “e” The number of branches is increased by 2 each time. The number of branches is an arithmetic series. 0,2,4,6,8,…. A(n) = A(1)+(n-1)d. A(1) = 0; d=2. => A(n) = (n-1)*2 = 2n-2

15 How many rooted trees TR = “TREE ROOTED” The number of branches is increased by 2 each time. The number of branches is an arithmetic series. 0,2,4,6,8,…. A(n) = A(1)+(n-1)d. A(1) = 0; d=2. => A(n) = (n-1)*2 = 2n-2 ab 2 branches. 3 possible places to add “c” c c c Each time we can add a new branch in Br(n)+1 places. [Br(n)=number of branches] TR(n+1) = TR(n)*(BR(n)+1)=TR(n)*(2n-1) TR(5) = TR(4)*7=TR(3)*5*7=TR(2)*3*5*7=1*3*5*7 … TR(n) = 1*3*5*7*…..*(2n-3) [Tr(n)=number of trees with n sequences]

16 How many rooted trees TR = “TREE ROOTED” n!=1*2*3*4*5*6…..*n = n factorial. TR(n) = 1*3*5*7*…..*(2n-3) = 2*4*6*8*….*(2n-4) = 1*2*3*4*5*6*7*…*(2n-3) (2*1)*(2*2)*(2*3)*(2*4)*….*(2*(n-2)) = 1*2*3*4*5*6*7*…*(2n-3) (2 (n-2) )*(1*2*3*4*….(n-2)) = (2n-3)! (2 (n-2) )*(n-2)! (2n-3)! =

17 How many rooted trees TR = “TREE ROOTED” TR(n) = 1*3*5*7*…..*(2n-3) = (2 (N-2) )*(n-2)! (2n-3)! = =(2n-3)!!

18

19 How many unrooted trees Ex: show that the number of unrooted trees is given by 1*3*5*…*(2n-5) where n is the number of sequences. Open questions A close formula does not exist, though the recursion formula exists (Felsenstein 1987, Schroder, 1870). There are other results about the asymptotic rate at which the numbers rise, and other results concerning number of tree shapes, etc…

20

21 HEURISTIC SEARCH

22 There are many trees.., We cannot go over all the trees. We will try to find a way to find the best tree. These are approximate solutions…

23 Finding the maximum is the same thing as finding the minimum Say we have a computer procedure that given a function, it finds its minimum, and we want to find the maximum of a function f(x). We can just find the minimum of -f(x) and this is minus the maximum of f(x). Example. f(0) = 3; f(1) = 7; f(2) = -5; f(3) = 0; max f(x) = 7. argmax f(x) = 1; -f(0)=-3; -f(1) = -7; -f(2) = 5; -f(3) =0; min(-f(x)) = -7. argmax –(f(x) = 1;

24 Score = 1700

25 Score = 1700 Score = 1825 Score = 1710 Score = 1410 Score = 1695

26 Score = 1825 Score = 1828 Score = 1910 Score = 1800

27 Max score = 2900

28 Score = 2100 Problem number 1: local maximum Score = 3100 Score = 2900 Local max Global max

29 This algorithm is “greedy” – it seizes the first improvement encountered. One way to avoid local maxima is to start from many random starting points

30 Several options to define a neighbor. Option 1Option 2

31 Nearest-neighbor interchange A BC DA DC BD BC A Each internal branch defines two neighbors

32 How many neighbors do we check each time? For unrooted trees of n taxa, we have 2n-3 branches. However, only internal branches are interesting, thus we have n-3. Each defines two neighbors, thus the total number of neighbors in each NNI cycle is 2n-6. A B C D E Internal branches External branches NNI is possible only in internal branches

33 I am greedy

34 (1)Most greedy: Start searching your neighbors. If you find something better – move there, and start the search again. (2)Just greedy: Check ALL your neighbors. Move to the one that is the highest. (3)Smart greedy: Try all NNI of trees that are tied for the best score. Greedy variants There are many other variants of the greedy search that would not be discussed in this course.

35 SPR = SUBTREE PRUNING AND REGRAFTING A C D E B D E A CB 1.Chose a branch and cut it in 2. 2.Remove the sticky end from one subtree. 3.Connect the remaining sticky end to one branch in the other subtree. D E A CB D E A CB

36 A C D E B A CB 1.Chose a branch and cut it in 2. 2.Remove the sticky end from both subtrees. 3.Connect the remaining 2 subtrees anywhere. A CB F E A CB TBR = TREE BISECTION AND RECONNECTION F D E F D E F D

37 Sequential addition A C B D D CA E B D CA 1.Start with a 3-taxa tree. 2.Estimate all possible addition of the next taxa. Red: best addition B E One can do rearrangements in each addition step to increase efficiency.

38 Star decomposition A C B D D (C,B) A E B D CA 1.Start with an n-taxa star-tree. 2.In each step find the best pair of taxa to separate from the star’s root. E One can do rearrangements in each addition step to increase efficiency. E Red: best pair to group together

39 Simulated Annealing Another method to avoid local maxima. The idea in the simulated annealing is to relax the greediness by allowing steps to go downhill. For example we pick up one NNI neighbor randomly. If it is uphill – we move there. If it is downhill, we move there with a certain probability p. We can control the probability p. In the beginning of the search allow p to be high. As the search progresses, reduce p (i.e., make the search more greedy).

40

41 Branch and Bound

42 There are many trees.., We cannot go over all the trees. We will try to find a way to find the best tree. There are approximate solutions… But what if we want to make sure we find the global maximum. There is a way more efficient than just to go over all possible trees. It is called BRANCH AND BOUND and is a general technique in computer science, that can be applied to phylogeny.

43 BRANCH AND BOUND To exemplify the BRANCH AND BOUND (BNB) method, we will use an example not connected to evolution. Later, when the general BNB method is understood, we will see how to apply this method to finding the MP tree. We will present the shortest Hamiltonian path (SHP) problem.

44 THE SHP PROBLEM (adapted to Israel). A guard has to visit n check-points on a map. The problem is to find the shortest route (including the starting point) that goes through all points. Naïve approach: (say for 5 points). You have 5 starting points. For each such starting point you have 4 possible “next steps”. For each such combination of starting point and first step, you have 3 possible second steps, etc. All together we have 5*4*3*2*1 possible solutions = 5!.

45 THE SHP TREE

46 THE SHP NAÏVE APPROACH Each solution can be represented as a permutation: (1,2,3,4,5) (1,2,3,5,4) (1,2,4,3,5) (1,2,4,5,3) (1,2,5,3,4) … We can go over the list and find the one giving the highest score.

47 THE SHP NAÏVE APPROACH However, for 15 points for example, there are 1,307,674,368,000 permutations. The rate of increase of the number of solutions is too big (more than exponential).

48 THE SHP HEURISTIC APPROACH Start from a random point. Go to the closest point. This approach doesn’t work so good…

49 Computation times The question is the relationship between computation time and n. In very good cases, the computation time scales linearly with n: the computation time is increased by a constant for each increase in n. In polynomial time, the function relating the dependency between computation time and n is a polynomial. For example CT(n) = 7n 2.

50 Computation times No matter what polynomial function we have, exponential functions like 2 n will overtake for large enough n..

51 NP-complete Computer science theory shows that there is a class of problems that appear not to have a polynomial time solutions. All these np-complete problems are equivalent, in the sense that if ever one finds a polynomial solution to one – he can solve all of them. Although it was never proven that there is no polynomial solution to these problems (biggest open question in computer science), most people believe this to be the case.

52 NP-hard There is another class of problems: the np-hard. There is no polynomial solution and even if the np-complete problems could be solved in polynomial time – this would not help solving these np-hard problems in polynomial time. The SHP is one such NP-hard problem!

53 G Estimating the parsimony score of a tree is not NP-complete. A C A G 4 n-2 possible reconstructions. n=number of sequences n-2=number of internal nodes One could go over all 4 n-2 possible assignments of characters to internal nodes to find the MP score. However, we have previously shown that although the naïve solution if exponential, a linear time algorithm exists.

54 BNB SOLUTION TO SHP Shortest path found so far = 15 Score here already 16: no point in checking the rest of the subtree

55 Back to finding the MP tree Finding the MP tree is NP-Hard… BNB helps, though it is still exponential…

56 The MP search tree is added to branch is added to branch 2. There are 5 branches

57 The MP search tree 4 is added to branch

58 MP-BNB 4 is added to branch Best record = 52

59 MP-BNB 4 is added to branch Best record = 52

60 MP-BNB 4 is added to branch Best record = 52

61 MP-BNB Best record = 52

62 MP-BNB Best record = 52

63 MP-BNB Best record =

64 MP-BNB Best record =

65 MP-BNB Best record =

66 MP-BNB Best record =

67 MP-BNB Best TREE. MP score = 42 Total trees visited: 14

68 MP-BNB – an improvement Evaluate all 3 first Total trees visited: 9 The “bound” after searching this subtree will be 42.