Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Doug Brutlag Professor.

Similar presentations


Presentation on theme: "Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Doug Brutlag Professor."— Presentation transcript:

1 Computational Molecular Biology Biochem 218 – BioMedical Informatics Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Phylogenies

2 Cladogram Representation of Phylogenies A B C D E F G H I J L M N O P Q Years x Years x

3 Dendrogram Representation of Phylogenies Substitutions per 100 Residues GrowTree Phylogram February 1, 2010

4 Cladogram

5 Phenogram

6 Curve-O-Gram

7 Eurogram

8 RadialGram

9 Methods for Determining Phylogenies Parsimony (character based) o Assigns mutations to branches o Minimize number of edits o Topology maximizes similarity of neighboring leaves Distance methods o Branch lengths = D(i,j)/2 for sequences i, j o Distances must be at least metric o Distances can reflect time or edits o Distance must be relatively constant per unit branch length A B C D E F G H I J L M N O P Q

10 Methods for Determining Phylogenies Parsimony o Minimum mutation (Fitch, PAUP) o Minimal length encoding Probabilistic o Branch and Bound o Maximum likelihood Distance methods o Ultrametric Trees o Additive Trees o UPGMA o Neighbor Joining A B C D E F G H I J L M N O P Q

11 Properties of Trees Rooted or Unrooted Nodes and Branches o Internal Nodes o External Nodes - leaves Operational Taxonomic Units Outgroups Topology One path/pair Distances X Y Z R X3 4 5 Z Y R2 21 X Z R1 R Y

12 Orthologous Evolution

13 Paralogous Evolution hemoglobins

14 Challenges Making Trees: Gene Duplication versus Speciation

15 Orthology and Paralogy HB Human WB Worm HA1 Human HA2 Human Yeast WA Worm Thanks to Seraphim Batzoglou Orthologs: Derived by speciation Paralogs: Gene Duplications Orthologs: Derived by speciation Paralogs: Gene Duplications

16 Challenges Making Trees: Gene Conversion A A T C G C G A T A G C A T C A A T T C C C T C Thanks to Maryellen Ruvolo

17 A A T C G C G A T A G C A T CC G C G A T C TC A T C A A T T C C C T C Challenges Making Trees: Gene Conversion Thanks to Maryellen Ruvolo

18 A B C D Gene N Challenges Making Trees: Gene Conversion Gene M Thanks to Maryellen Ruvolo Orthologs: Derived by speciation Paralogs: Gene Duplications Orthologs: Derived by speciation Paralogs: Gene Duplications

19 A N B N C M C N D N A M B M D M Challenges Making Trees: C M Has Been Converted from C N Thanks to Maryellen Ruvolo Orthologs: Derived by speciation Paralogs: Gene Duplications Orthologs: Derived by speciation Paralogs: Gene Duplications Gene N Gene M

20 Consensus CG/LH Tree Thanks to Maryellen Ruvolo

21 Gene conversion between 1 st & 2nd exons of LH, CG2 Genes LH Gen e CG2 Gene 168 nt 15nt No ConversionConversion ThankThank Thanks to Maryellen Ruvolo

22 Challenges Making Trees: Varying Rates of Mutation

23 Challenges Making Trees: Horizontal Gene Transfer

24 Maximum Ultrametric Distance Trees Matrix D is ultrametric for tree T if: o If D is a symmetric n by n matrix of distances o T contains n leaves, one from each row or column o Each node of T labeled by one entry from D o Numbers from root to leaves strictly decrease o For any two leaves i, j, D(i,j) labels nearest common ancestor of i and j in tree Matrix DTree T

25 Maximum Ultrametric Distance Trees A symmetric matrix D is ultrametric if and only if for every three leaves i, j, and k, there is a tie for the maximum distance between D(i,j), D(i,k) and D(j,k). U V IJ K

26 Additive Distance Trees Matrix D Tree T A B C D

27 Distance Metrics Obey the Triangle Inequality D(i,j) ≤ D(i,k) +D(j,k) for all i, j, k (Max Score - Smith-Waterman Score) is a Metric if o If Gap-penalty ≥ 1+ Gap-size/(n-1) o Assuming match = 1 and mismatch = -1

28 Three Leaf Tree Observe D 1,2 D 1,3 D 2,3 Calculate L 1,A L 2,A L 3,A A L1,A L2,A L3,A

29 Three Leaf Tree Observe D 1,2 D 1,3 D 2,3 Calculate L 1,A L 2,A L 3,A D1,2=L1,A+L2,A D1,3=L1,A+L3,A D2,3=L2,A+L3,A A L1,A L2,A L3,A

30 Solution to Three Species Tree A L1,A L2,A L3,A

31 Four Species Tree Calculate L 1,A L 2,A L 3,B L 4,B, L A,B Observe D 1,2 D 1,3 D 1,4 D 2,3 D 2,4 D 3, L1,A L2,A A,B L3,A L4,A A B

32 Four Species Topology Label species 1, 2, 3, and 4 so that: D(1,2) + D(3,4) ≤ D(1,3) + D(2,4) = D(1,4) + D(2,3) A 4 B L1,A L2,A LA,B L3,A L4,A

33 Solution for Four Species L1,A = 1/4*(D1,3 + D1,4 - D2,3 -D2,4) + 1/2*D1,2 L2,A = 1/4*(D2,3 + D2,4 - D1,3 - D1,4) + 1/2*D1,2 LB,3 = 1/4*(D1,3 + D2,3 - D1,4 - D2,4) + 1/2*D3,4 LB,4 = 1/4*(D1,4 + D2,4 - D1,3 - D2,3) + 1/2*D3,4 LA,B = 1/4*(D1,3 + D1,4 + D2,3 + D2,4) - 1/2*(D1,2 + D3,4) A 4 B L1,A L2,A LA,B L3,A L4,A

34 Four Species =>Three Topologies A 4 B A 4 B 1 2 A 3 4

35 Species, Distances, Branches & Topologies

36

37 Number of Topologies for n Species

38 UPGMA: Unweighted Pair Group Method with Arithmetic Average Where D1,(34) = (D1,3+D1,4)/2 and D2,(34) = (D2,3+D2,4)/

39 UPGMA Dendrogram

40 UPGMA Clustering

41 Neighbor Joining Method XY i j n n-2 n-1 n n-2 n-3 n-4 n

42 Nearest Neighbor Dendrogram A B C D E R W X Y Z

43 New Hampshire Standard Tree

44 SeqWeb GrowTree Program

45 GrowTree Parameters

46 GrowTree Distances

47 GrowTree Phylogram (UPGMA)

48 GrowTree Alignment

49 GrowTree Neighbor Joining Tree

50 GrowTree VegF Input

51 GrowTree VegF Neighbor Joining Tree

52 VegF Growth Factors

53 GrowTree VegF UPGMA Tree

54 GrowTree VegF Alignment


Download ppt "Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Doug Brutlag Professor."

Similar presentations


Ads by Google