Phylogenetics and Coalescence Lab 9 October 24, 2012.

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

An Introduction to Phylogenetic Methods
Phylogenetic Analysis
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Lecture 23: Introduction to Coalescence April 7, 2014.
Phylogenetic reconstruction
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
Phylogenetic reconstruction
We have shown that: To see what this means in the long run let α=.001 and graph p:
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
07/05/2004 Evolution/Phylogeny Introduction to Bioinformatics MNW2.
Lecture 24 Inferring molecular phylogeny Distance methods
Probabilistic methods for phylogenetic trees (Part 2)
Chapter 2 Opener How do we classify organisms?. Figure 2.1 Tracing the path of evolution to Homo sapiens from the universal ancestor of all life.
Phylogenetic Analysis
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Terminology of phylogenetic trees
BINF6201/8201 Molecular phylogenetic methods
Molecular phylogenetics
Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Phylogenetic Inference Data Optimality Criteria Algorithms Results Practicalities BIO520 BioinformaticsJim Lund Reading: Ch8.
Day 8,9 Carlow Bioinformatics Phylogenetic inferences Trees.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
A brief introduction to phylogenetics
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Building Phylogenetic Trees.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Statistical stuff: models, methods, and performance issues CS 394C September 16, 2013.
Lecture 17: Phylogenetics and Phylogeography
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Coalescent theory CSE280Vineet Bafna Expectation, and deviance Statements such as the ones below can be made only if we have an underlying model that.
Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.
Phylogenetics and Coalescence. Goals Construct phylogenetic trees using the UPGMA method Use nucleotide sequences to construct phylogenetic trees using.
Statistical stuff: models, methods, and performance issues CS 394C September 3, 2009.
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Phylogenetic Trees. An old and controversial question: What is our relationship to the modern species of apes? Consider the following species: gorilla,
Phylogenetic basis of systematics
Inferring a phylogeny is an estimation procedure.
Clustering methods Tree building methods for distance-based trees
Multiple Alignment and Phylogenetic Trees
Goals of Phylogenetic Analysis
Methods of molecular phylogeny
Molecular basis of evolution.
BNFO 602 Phylogenetics Usman Roshan.
Why Models of Sequence Evolution Matter
Lecture 7 – Algorithmic Approaches
Molecular data assisted morphological analyses
Algorithms for Inferring the Tree of Life
But what if there is a large amount of homoplasy in the data?
Phylogenetic analysis of AquK2P.
Presentation transcript:

Phylogenetics and Coalescence Lab 9 October 24, 2012

Goals Construct phylogenetic trees using the UPGMA method Use nucleotide sequences to construct phylogenetic trees using UPGMA, NJ, and Maximum Parsimony methods Use coalescent simulation to determine historical change in N e Interpret coalescent trees to draw inferences about human migrations

Phylogenetic Methods Scope of the problem – Number of possible unrooted trees for n OTUs: – For 10 taxa -> 2,027,025 possible unrooted trees. – Need an optimality criterion

Phylogenetic methods A.Distance methods. 1. Unweighted Pair Group Methods using Arithmetic averages(UPGMA). 2. Neighbor Joining (NJ). 3. Minimum evolution(ME). B. Character based methods. 1. Maximum Parsimony (MP). 2. Maximum Likelihood (ML). 3. Bayesian Method (BA)

UPGMA Taxa HumanTGCGTAT ChimpanzeeTGGGTAT GorillaTGCGCTT OrangutanTGCTGTG GibbonTAGTAGC Step 1: Generate data (Sequence/ Genotype/ Morphological) for each OTU.

Distance can be calculated by using different substitution models : 1. # of nucleotide differences. 2. p-distance. 3. JC distance 4. K2P distance. 5. F81 6. HKY85 7.GTR etc Step 2: Calculate p- distance for all pairs of taxa. Taxa123*4567 HumanTGCGTAT ChimpanzeeTGGGTAT =

Step 3: Calculate distance matrix for all pairs of taxa and select pair of taxa with minimum distance as new OTU. TaxaHuChGoOrGi Hu0 Ch Go Or Gi Human Chimpanzee 0.714

Step 4: Recalculate new distance matrix, assuming human and chimpanzee as one OTU. taxaHu+chGoOrGi Hu+Ch Go Or Gi = taxaHu+chGoOrGi Hu+Ch0 Go Or Gi TaxaHuChGoOrGi Hu0 Ch Go Or Gi

Step 5: Select pair of taxa with minimum distance as new OTU. Human Chimpanzee Gorilla

Step 6: Again select pair of OTU with minimum distance as new OTU and recalculate distance matrix. taxa(Hu+ch)GoOrGi (Hu+ch)Go Or Gi = taxa(Hu+ch)GoOrGi (Hu+ch)Go0 Or Gi TaxaHuChGoOrGi Hu0 Ch Go Or Gi

Step 7: Again select pair of taxa with minimum distance as new OTU. Chimpanzee Human Gorilla Orangutan

Step 8: Again select pair of OTU with minimum distance as new OTU and recalculate distance matrix. taxa((Hu+ch)Go)OrGi ((Hu+ch)Go)Or Gi = taxa((Hu+ch)Go)OrGi ((Hu+ch)Go)Or0 Gi TaxaHuChGoOrGi Hu0 Ch Go Or Gi

Step 9: Again select pair of OTU with minimum distance as new OTU and make final rooted tree. Chimpanzee Human Gorilla Orangutan Gibbon

Branch Supports 1.Bootstrap support. 2.Jack-knife support. 3.Bremer support. 4.Posterior probability support.

Bootstrap support Step 1: Randomly make “n” pseudo-replicates of the data with replacement and make tree from each replicate. Taxa HumanGGCGATT ChimpanzeeGGGGATT GorillaGGCGTTT Taxa HumanTCTATGG ChimpanzeeTGTATGG GorillaTCCTTGG Taxa HumanTGCGTAT ChimpanzeeTGGGTAT GorillaTGCGCTT

Bootstrap support Step 2: Make consensus tree of trees obtained from all pseudo replicates.

Phylogenetic Software available 1.PAUP. 2.Phyllip. 3.MrBayes. 4.Mega.

Problem 1. File mt_primates.meg contains the sequence data used to calculate the genetic distances in Example 1. Use Mega to build phylogenetic trees based on: 1.UPGMA. 2.The NJ Method. 3.Maximum Parsimony. Compute bootstrap confidence in the internal nodes of each tree. Compare the trees derived using each of these methods. Which do you think is the most informative? Does the computational efficiency of the UPGMA method result in misleading results in this case?

Problem 2. File pdha1_human.meg contains haplotypes detected by sequencing a 4.2-kb region of the X-linked Pyruvate Dehydrogenease E1 α Subunit (PDHA1) in 16 African and 19 non-African males. Use Mega to build a phylogenetic tree based on the NJ Method and interpret the results in the light of hypotheses about the origin of modern humans (see Example 11.4, p , as well as p. 618 in Hedrick 2005).

Coalescence Wright-Fisher Model Until now we have implicitly used the Wright- Fisher Model Computationally expensive

Wright Fisher

The Discrete Coalescent Probability that two genes have MRCA j generations ago Probaility that 2 genes out of k have a common ancestor j generations ago Probability of no coalescence for j – 1 generations Probability of coalescence in the jth generation Probability of no coalescence in k lineages for j – 1 generations Probability of coalescence in the jth generation

The Continuous Coalescent Can derive continuous exponential function from discrete geometric representation Waiting time (T) for k genes to have k-1 ancestors (See math box 3.2 in Hamilton, 2009)

The Continuous Coalescent Model of population growth underlies coalescence Exponential Growth Bottleneck Hein et al. 2005

Coalescent Applications Coalescent topologies can be dependent upon convolution of N e and μ, migration rate, selection, recombination rate. Applications – Estimating recombination rates – Estimating historical migration rates between poulations – Estimating tMRCA – Estimating historical effective population size – Estimating strength of selection

From Data to coalescence Suppose we observe n genes with k mutations We want to get θ=4Neμ but do not know its true value Can calculate likelihood of θ for a bunch of possible values and find the one with highest probability

MCMC 1.Sample a new history from a distribution of histories (topologies + waiting times) 2.Divide the likelihood of this new history by the likelihood of the last history sampled 3.With probability proportional to this likelihood ratio, move to the new point. 4.Repeat steps 1-4.

Problem Fossil and molecular based evidence have both provided strong evidence for the divergence of the human and chimpanzee lineages approximately 6 MYA. However, timings and locations of human expansions beyond Africa have proved controversial. Use the Bayesian MCMC software BEAST to derive coalescent trees for sequences from the X-linked Pyruvate Dehydrogenease E1-alpha subunit gene that you also analyzed in Problem 2