Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
An Introduction to Phylogenetic Methods
Wellcome Trust Workshop Working with Pathogen Genomes Module 6 Phylogeny.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
IE68 - Biological databases Phylogenetic analysis
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
© Wiley Publishing All Rights Reserved. Phylogeny.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
BME 130 – Genomes Lecture 26 Molecular phylogenies I.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic trees Sushmita Roy BMI/CS 576
. Phylogenetic Trees Lecture 13 This class consists of parts of Prof Joe Felsenstein’s lectures 4 and 5 taken from:
Phylogenetic Analysis
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Terminology of phylogenetic trees
BINF6201/8201 Molecular phylogenetic methods
Molecular phylogenetics
Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Tree Inference Methods
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Day 8,9 Carlow Bioinformatics Phylogenetic inferences Trees.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
Chapter 8 Molecular Phylogenetics: Measuring Evolution.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Lecture 2: Principles of Phylogenetics
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Phylogeny and Genome Biology Andrew Jackson Wellcome Trust Sanger Institute Changes: Type program name to start Always Cd to phyml directory before starting.
Statistical stuff: models, methods, and performance issues CS 394C September 16, 2013.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Phylogeny Ch. 7 & 8.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Phylogenetics.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2.
Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.
Statistical stuff: models, methods, and performance issues CS 394C September 3, 2009.
Molecular Evolution. Study of how genes and proteins evolve and how are organisms related based on their DNA sequence Molecular evolution therefore is.
Introduction to Bioinformatics Resources for DNA Barcoding
Phylogenetic basis of systematics
Inferring a phylogeny is an estimation procedure.
Maximum likelihood (ML) method
Goals of Phylogenetic Analysis
Multiple Alignment, Distance Estimation, and Phylogenetic Analysis
Molecular Evolution.
Summary and Recommendations
CS 581 Tandy Warnow.
Why Models of Sequence Evolution Matter
#30 - Phylogenetics Distance-Based Methods
Lecture 7 – Algorithmic Approaches
Summary and Recommendations
Presentation transcript:

Why do trees?

Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal, often living species, individuals) Branches length scaled Branches length unscaled, nominal, arbitrary Outgroupan OTU that is most distantly related to all the other OTUs in the study.

Phylogeny 102 Trees rooted (N=(2n-3)! / 2 n-2 (n-2)! Trees unrooted (N=(2n-5)! / 2 n-3 (n-3)! OTUs #rooted trees #unrooted trees

Trees NJ Distance matrix UPGMA assumes constant rate of evolution – molecular clock: don’t publish UPGMA trees Neighbor joining is very fast Often a “good enough” tree Embedded in ClustalW Use in publications only if too many taxa to compute with MP or ML

Distances from sequence Protdist/DNAdist Non-identical residues/total sequence length Correction for multiple hits necessary because 2 ID residues may be C -> T -> C Jukes-Cantor assumes all subs equally likely Kimura: transition rate NE transversion rate Ts usually > Tv

Trees MP Maximum parsimony Minimum # mutations to construct tree Better than NJ – information lost in distance matrix – but much slower Sensitive to long-branch attraction No explicit evolutionary model Protpars refuses to estimate branch lengths Informative sites

Trees ML Very CPU intensive Requires explicit model of evolution – rate and pattern of nucleotide substitution –JC Jukes/Cantor –K2P Kimura 2 parameter transition/transversion –F81 Felsenstein – base composition bias –HKY85 merges K2P and F81 Explicit model -> preferred statistically Assumes change more likely on long branch No long-branch attraction Wrong model -> wrong tree

Models of sequence evolution HKY85 A C G T A  C   G   T  C  A   G   T  G  A   C   T  T  A   C   G 

Here we have a representative alignment. Want to determine the phylogenetic relationships among the OTUs: Site: OTU1 A A G A G T G C A OTU2 A G C C G T G C G OTU3 A G A T A T C C A OTU4 A G A G A T C C G * * * It is a good alignment clearly aligning homologous sites without gaps.

There are 3 possible trees for 4 taxa (OTUs): \_____/ \_____/ \_____/ / \ / \ / \ Or (1,2)(3,4) (1,3)(2,4) and (1,4)(2,3) Aim to identify (phylogenetically) informative sites and use these to determine which tree is most parsimonious.

The identical sites 1, 6, 8 are useless for phylogenetic purposes. Site: OTU1 A A G A G T G C A OTU2 A G C C G T G C G OTU3 A G A T A T C C A OTU4 A G A G A T C C G * * *

Site 2 also useless: OTU1’s A could be grouped with any of the Gs. Site: OTU1 A A G A G T G C A OTU2 A G C C G T G C G OTU3 A G A T A T C C A OTU4 A G A G A T C C G * * *

Site 4 is uniformative as each site is different. UNLESS transitions weighted in which case (1,4)(2,3) Site: OTU1 A A G A G T G C A OTU2 A G C C G T G C G OTU3 A G A T A T C C A OTU4 A G A G A T C C G * * *

For site 3 each tree can be made with (minimum) 2 mutations: Site: OTU1 A A G A G T G C A OTU2 A G C C G T G C G OTU3 A G A T A T C C A OTU4 A G A G A T C C G * * *

(1,2)(3,4) G A G A G A \ / \ / \ / G---A C---A A---A / \ / \ / \ C A C A C A

(1,3)(2,4) G C can do worse:G C \ / \ / A---A G---A / \ / \ A A

(1,4)(2,3) G C \ / A---A / \ A So site 3 is (Counterintuitively) NOT informative

Site 5, however is informative because one tree shortest. Site: OTU1 A A G A G T G C A OTU2 A G C C G T G C G OTU3 A G A T A T C C A OTU4 A G A G A T C C G * * *

(1,2)(3,4) (1,3)(2,4) (1,4)(2,3) G A G G G G \ / \ / \ / G---A A---A G---G / \ / \ / \ G A A A A A

Likewise sites 7 and 9. By majority rule most parsimonious tree is (1,2)(3,4) supported by 2/3 informative sites. Site: OTU1 A A G A G T G C A OTU2 A G C C G T G C G OTU3 A G A T A T C C A OTU4 A G A G A T C C G * * *

Protpars infile: BRU MSQNSLRLVE DNSV-DKTKA LDAALSQIER RLR V-DKSKA LEAALSQIER NGR MSD-DKSKA LAAALAQIEK ECO AIDE-NKQKA LAAALGQIEK YPR M AIDE-NKQKA LAAALGQIEK PSE MDD-NKKRA LAAALGQIER TTH MEE-NKRKS LENALKTIEK ACD MDEPGGKIE FSPAFMQIEG

Protpars treefile: (((((ACD,TTH),(PSE,(YPR,ECO)) ),NGR),RLR),BRU);

outfile: One most parsimonious tree found: +-ACD ! +-TTH +-6 ! ! +----PSE ! ! +-YPR ! ! +-4 ! ! +-ECO +-2 ! ! ! NGR --1 ! ! RLR ! BRU remember: this is an unrooted tree! requires a total of

Clustalw ****** PHYLOGENETIC TREE MENU ****** 1. Input an alignment 2. Exclude positions with gaps? = ON 3. Correct for multiple substitutions? = ON 4. Draw tree now 5. Bootstrap tree 6. Output format options S. Execute a system command H. HELP or press [RETURN] to go back to main menu

ClustalW NJ (((ACD: , TTH: ) : , ((BRU: , RLR: ) : , NGR: ) : ) : , (ECO: , YPR: ) : , PSE: ); topologically the same as (((ACD,TTH),((BRU,RLR),NGR)),(ECO,YPR),PSE); and cf: Protpars: (((((ACD,TTH),(PSE,(YPR,ECO))),NGR),RLR),BRU);

NJ vs ProtPars

Dealing with CDSs More info in DNA than proteins Systematic 3 rd posn changes can confuse Use DNA directly only if evol dist short For distant relationships: blank 3 rd positions Translate into protein to align –then copygaps back to DNA Use dnadist with weights to investigate rates

Trees General guidelines – NOT rules More data is better Excellent alignment = few informative sites Exclude unreliable data – toss all gaps? Use seqs/sites evolving at appropriate rate – Phylip DISTANCE – 3 rd positions saturated – 2 nd positions invariant – Fast evolving seqs for closely related taxa – Eliminate transition - homoplasy

Trees Beware base composition bias in unrealted taxa Are sites (hairpins?) independent? Are substitution rates equal across dataset? Long branches prone to error – remove them?