Presentation is loading. Please wait.

Presentation is loading. Please wait.

Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,

Similar presentations


Presentation on theme: "Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,"— Presentation transcript:

1 Why do trees?

2 Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal, often living species, individuals) Branches length scaled Branches length unscaled, nominal, arbitrary Outgroupan OTU that is most distantly related to all the other OTUs in the study.

3 Phylogeny 102 Trees rooted (N=(2n-3)! / 2 n-2 (n-2)! Trees unrooted (N=(2n-5)! / 2 n-3 (n-3)! OTUs #rooted trees #unrooted trees 211 331 4153 510515 6954105 710395954 813513510395 92027025135135 10343494252027025

4 Trees NJ Distance matrix UPGMA assumes constant rate of evolution – molecular clock: don’t publish UPGMA trees Neighbor joining is very fast Often a “good enough” tree Embedded in ClustalW Use in publications only if too many taxa to compute with MP or ML

5 Distances from sequence Protdist/DNAdist Non-identical residues/total sequence length Correction for multiple hits necessary because 2 ID residues may be C -> T -> C Jukes-Cantor assumes all subs equally likely Kimura: transition rate NE transversion rate Ts usually > Tv

6 Trees MP Maximum parsimony Minimum # mutations to construct tree Better than NJ – information lost in distance matrix – but much slower Sensitive to long-branch attraction No explicit evolutionary model Protpars refuses to estimate branch lengths Informative sites

7 Trees ML Very CPU intensive Requires explicit model of evolution – rate and pattern of nucleotide substitution –JC Jukes/Cantor –K2P Kimura 2 parameter transition/transversion –F81 Felsenstein – base composition bias –HKY85 merges K2P and F81 Explicit model -> preferred statistically Assumes change more likely on long branch No long-branch attraction Wrong model -> wrong tree

8 Models of sequence evolution HKY85 A C G T A  C   G   T  C  A   G   T  G  A   C   T  T  A   C   G 

9 Here we have a representative alignment. Want to determine the phylogenetic relationships among the OTUs: Site: 1 2 3 4 5 6 7 8 9 OTU1 A A G A G T G C A OTU2 A G C C G T G C G OTU3 A G A T A T C C A OTU4 A G A G A T C C G * * * It is a good alignment clearly aligning homologous sites without gaps.

10 There are 3 possible trees for 4 taxa (OTUs): 1 3 1 2 1 2 \_____/ \_____/ \_____/ / \ / \ / \ 2 4 3 4 4 3 Or (1,2)(3,4) (1,3)(2,4) and (1,4)(2,3) Aim to identify (phylogenetically) informative sites and use these to determine which tree is most parsimonious.

11 The identical sites 1, 6, 8 are useless for phylogenetic purposes. Site: 1 2 3 4 5 6 7 8 9 OTU1 A A G A G T G C A OTU2 A G C C G T G C G OTU3 A G A T A T C C A OTU4 A G A G A T C C G * * *

12 Site 2 also useless: OTU1’s A could be grouped with any of the Gs. Site: 1 2 3 4 5 6 7 8 9 OTU1 A A G A G T G C A OTU2 A G C C G T G C G OTU3 A G A T A T C C A OTU4 A G A G A T C C G * * *

13 Site 4 is uniformative as each site is different. UNLESS transitions weighted in which case (1,4)(2,3) Site: 1 2 3 4 5 6 7 8 9 OTU1 A A G A G T G C A OTU2 A G C C G T G C G OTU3 A G A T A T C C A OTU4 A G A G A T C C G * * *

14 For site 3 each tree can be made with (minimum) 2 mutations: Site: 1 2 3 4 5 6 7 8 9 OTU1 A A G A G T G C A OTU2 A G C C G T G C G OTU3 A G A T A T C C A OTU4 A G A G A T C C G * * *

15 (1,2)(3,4) G A G A G A \ / \ / \ / G---A C---A A---A / \ / \ / \ C A C A C A

16 (1,3)(2,4) G C can do worse:G C \ / \ / A---A G---A / \ / \ A A

17 (1,4)(2,3) G C \ / A---A / \ A So site 3 is (Counterintuitively) NOT informative

18 Site 5, however is informative because one tree shortest. Site: 1 2 3 4 5 6 7 8 9 OTU1 A A G A G T G C A OTU2 A G C C G T G C G OTU3 A G A T A T C C A OTU4 A G A G A T C C G * * *

19 (1,2)(3,4) (1,3)(2,4) (1,4)(2,3) G A G G G G \ / \ / \ / G---A A---A G---G / \ / \ / \ G A A A A A

20 Likewise sites 7 and 9. By majority rule most parsimonious tree is (1,2)(3,4) supported by 2/3 informative sites. Site: 1 2 3 4 5 6 7 8 9 OTU1 A A G A G T G C A OTU2 A G C C G T G C G OTU3 A G A T A T C C A OTU4 A G A G A T C C G * * *

21 Protpars infile: 8 370 BRU MSQNSLRLVE DNSV-DKTKA LDAALSQIER RLR ---------- ---V-DKSKA LEAALSQIER NGR ---------- -MSD-DKSKA LAAALAQIEK ECO ---------- AIDE-NKQKA LAAALGQIEK YPR ---------M AIDE-NKQKA LAAALGQIEK PSE ---------- -MDD-NKKRA LAAALGQIER TTH ---------- -MEE-NKRKS LENALKTIEK ACD ---------- -MDEPGGKIE FSPAFMQIEG

22 Protpars treefile: (((((ACD,TTH),(PSE,(YPR,ECO)) ),NGR),RLR),BRU);

23 outfile: One most parsimonious tree found: +-ACD +-------7 ! +-TTH +-6 ! ! +----PSE ! +----5 +-3 ! +-YPR ! ! +-4 ! ! +-ECO +-2 ! ! ! +-------------NGR --1 ! ! +----------------RLR ! +-------------------BRU remember: this is an unrooted tree! requires a total of 853.000

24 Clustalw ****** PHYLOGENETIC TREE MENU ****** 1. Input an alignment 2. Exclude positions with gaps? = ON 3. Correct for multiple substitutions? = ON 4. Draw tree now 5. Bootstrap tree 6. Output format options S. Execute a system command H. HELP or press [RETURN] to go back to main menu

25 ClustalW NJ (((ACD:0.28958, TTH:0.32705) :0.03395, ((BRU:0.07321, RLR:0.07032) :0.11692, NGR:0.21168) :0.02493) :0.02092, (ECO:0.05022, YPR:0.05736) :0.11997, PSE:0.15632); topologically the same as (((ACD,TTH),((BRU,RLR),NGR)),(ECO,YPR),PSE); and cf: Protpars: (((((ACD,TTH),(PSE,(YPR,ECO))),NGR),RLR),BRU);

26 NJ vs ProtPars

27 Dealing with CDSs More info in DNA than proteins Systematic 3 rd posn changes can confuse Use DNA directly only if evol dist short For distant relationships: blank 3 rd positions Translate into protein to align –then copygaps back to DNA Use dnadist with weights to investigate rates

28 Trees General guidelines – NOT rules More data is better Excellent alignment = few informative sites Exclude unreliable data – toss all gaps? Use seqs/sites evolving at appropriate rate – Phylip DISTANCE – 3 rd positions saturated – 2 nd positions invariant – Fast evolving seqs for closely related taxa – Eliminate transition - homoplasy

29 Trees Beware base composition bias in unrealted taxa Are sites (hairpins?) independent? Are substitution rates equal across dataset? Long branches prone to error – remove them?


Download ppt "Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,"

Similar presentations


Ads by Google