Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bill Bruno Brian Foley Thomas Leitner Theoretical Biology & Biophysics

Similar presentations


Presentation on theme: "Bill Bruno Brian Foley Thomas Leitner Theoretical Biology & Biophysics"— Presentation transcript:

1 A short introduction to the theory and practice of phylogenetic inference
Bill Bruno Brian Foley Thomas Leitner Theoretical Biology & Biophysics Los Alamos National Laboratory

2 Overview Introduction & Alignments - Brian Foley
Distance-based Methods, Models & Tree search - Bill Bruno Character-based Methods, Bootstrap & Molecular Clock - Thomas Leitner Hands-On Work Time Group Discussion

3 From Data: To Phylogenetic Tree: www.t10.lanl.gov
myosin/trees/trees.html

4 Multiple Sequence Alignments
Choose the data set Select an appropriate outgroup Next closest relative to group(s) under study Still close enough to align well Create the alignment Get Sequences in right format (FASTA for example) Use a program (CLUSTALW, HMMER, DIALIGN) Hand-edit the alignment (BioEdit, SeAl, MASE, JALview) Remove uncertain columns (gaps, for example)

5 Pairwise Alignments Typical settings include gap open and gap extension penalties Dynamic Programming Algorithm is fast and efficient BLAST (Basic Local Alignment Search Tool) does a poor job with pairs that contain many in/dels BLAST scores depend on length as well as % identity

6 Multiple Sequence Alignments
NEVER blindly trust a machine-made alignment always view the entire alignment with an alignment editor (BioEdit, SeAl, MASE, jalview) and adjust or trim questionable regions Consider gaps, IUPAC ambiguity codes (R = purine etc) and how the phylogenetic software will treat them, stripping columns with these characters is one option Gene reorganization presents a problem for genome sized regions Phylogenetic comparison can only be done on region of overlap of all sequences in the alignment Multiple Sequence Alignment Software ProbCons TreeAlign Methods in Enzymology 183: ClustalW Methods in Enzymology 266: MALIGN Journal of Heredity 85: HMMER GeneDoc GCG Wisconsin Package TAAR Ctree DAMBE POY ALIGN DNASIS Etc…

7 Distance based methods
Alignment + Model  Pairwise Distance Matrix  Tree When more than 3 taxa, tree distances are over determined. So, find best tree. What is "best"? Ideally, distance through tree = pairwise distances Optimality conditions: minimum evolution, least squares, Weighbor...

8 Substitution models Evolutionary Distance = rate  evolutionary time
: ModelTest via web Evolutionary Distance = rate  evolutionary time Distance of 1.0 means on average one change per site Depends on model of evolution, except for short distances (when there is never more than 1 change per site, no homoplasy)

9 Correcting for multiple events
T Sequence d D AATAG GAATA 0 0 ACTAG GAATA 1 1 ACTAG GGATA 2 2 AATAG GGATA 1 3 AAAAG GAATA 1 5 AAAAA GAACA 3 7

10 Distance Tree Methods Extremely fast Can be unbiased, robust
Weighbor is most rigorous, but FastME can give excellent, but biased results Suitable for other problems: UPGMA More reliable Weighbor Fitch- Margoliash BioNJ FastME NJ Less robust Faster Slower

11 Searching for the best tree
There are (2n - 3)! / 2n-1(n-1)! trees for n taxa Thus, for larger datasets not all trees can be tested Exhaustive search Heuristic search Stepwise addition Star decomposition Branch swapping Algorithmic trees Other aspects of tree space Random trees Consensus trees Unresolved trees # TAXA # TREES 2 1 4 3 5 15 10 2 E6 22 3 E23 50 3 E74 100 2 E182 10 E6 5 E

12 Character based methods
Uses the aligned sequences directly to calculate a tree according to an optimalization criterion: Maximum parsimony (DNAPARS, PAUP*, MEGA, etc) Discriminates using parsimonious informative sites Selects the tree which requires the least number of steps to explain the alignment Maximum likelihood (DNAML, PAUP*, PAML, etc) Requires an explicit model of character evolution Calculates likelihood for each state at all sites Selects the tree with the highest overall likelihood (least negative log likelihood value)

13 Maximum Parsimony A B C O 1 3 2 O 3 2 1 1 12345 67890 GATCC TAGGC
Taxon Alignment 1 GATCC TAGGC GGTCA CATGT GGTCA TATCT O GATAC CAGCA O 1 3 2 Character 2 A B C A G (A) (G) O 3 2 1 Maximum Parsimony tree Tree Steps Sum A B C

14 Bootstrapping Non-parametric bootstrap
Bootstrap 50% majority-rule consensus tree / p1.136(1) | p1.719(2) | / p2.135(3) | | | / p3.105(4) | | | | | \ p3.529(5) | | | p5.317(6) | p6.6767(7) \ p7.6760(8) | / p8.159(9) | / | | \ p8.822(10) | / | | | / p11.113(12) | | \ \ \ p (13) \ p9.256(11) Bipartitions found in one or more trees and frequency of occurrence (bootstrap support values): 1 1 Freq % ** % ***** % ** % ..*********** % **.** % ..*** % ..***..* % ...** % ..***.** % ..** % .....** % ..***.* % .....*..***** % .....**.***** % *** % ..****** % .....*.* % ..*.* % *..** % .**** % ...**..* % ..***.******* % .....******** % .....*.****** % .****..* % ..**...* % ...*...* % .****.** % ..***..****** % 100 groups at (relative) frequency less than 5% not shown Non-parametric bootstrap Calculate a tree under a model using a tree building method Create pseudo replicates of the alignment Recalculate a tree for each pseudo replicate Compute a consensus tree of all pseudo trees Tests the reliability/robustness of the model-method Biased (usually conservative) Parametric bootstrap Tests the evolutionary model and process

15 The molecular clock Assumes ultra-metric data/tree
Genetic distance -time relationships

16 The molecular clock Evolutionary model important Rate variation
Genetic distance Time

17 Hands-on Open file in BioEdit Calculate distance-matrix tree
Manually check & correct alignment Calculate distance-matrix tree Calculate matrix with DNADIST Calculate tree with NEIGHBOR Calculate character-based tree DNAPARS or DNAML Calculate bootstrap support Use SEQBOOT, DNADIST, NEIGHBOR, CONSENSE View tree in TreeView

18 Group discussion Pro’s & Con’s Where to spend your time & effort
What else is available


Download ppt "Bill Bruno Brian Foley Thomas Leitner Theoretical Biology & Biophysics"

Similar presentations


Ads by Google