Molecular Systematics

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

Contrastive Divergence Learning
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Managerial Decision Modeling with Spreadsheets
Probability Distributions CSLU 2850.Lo1 Spring 2008 Cameron McInally Fordham University May contain work from the Creative Commons.
Exact Inference in Bayes Nets
Bayesian Estimation in MARK
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Markov-Chain Monte Carlo
Intro to Bayesian Learning Exercise Solutions Ata Kaban The University of Birmingham 2005.
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Bayesian Reasoning: Markov Chain Monte Carlo
Bayesian statistics – MCMC techniques
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Bayesian Learning Rong Jin.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Probabilistic methods for phylogenetic trees (Part 2)
7. Bayesian phylogenetic analysis using MrBAYES UST Jeong Dageum Thomas Bayes( ) The Phylogenetic Handbook – Section III, Phylogenetic.
Bayes Factor Based on Han and Carlin (2001, JASA).
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
WSEAS AIKED, Cambridge, Feature Importance in Bayesian Assessment of Newborn Brain Maturity from EEG Livia Jakaite, Vitaly Schetinin and Carsten.
Statistical Decision Theory
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Topic 2 – Probability Basic probability Conditional probability and independence Bayes rule Basic reliability.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Simulated Annealing.
Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU.
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:
Bayesian Phylogenetics. Bayes Theorem Pr(Tree|Data) = Pr(Data|Tree) x Pr(Tree) Pr(Data)
Simulated Annealing G.Anuradha.
MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Machine Learning 5. Parametric Methods.
Validation methods.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
TEMPLATE DESIGN © Approximate Inference Completing the analogy… Inferring Seismic Event Locations We start out with the.
Bayesian statistics named after the Reverend Mr Bayes based on the concept that you can estimate the statistical properties of a system after measuting.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Bayesian II Spring Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have a large enough sample of the posterior?
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
FIXETH LIKELIHOODS this is correct. Bayesian methods I: theory.
SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Markov Chain Monte Carlo in R
Bayesian Neural Networks
MCMC Output & Metropolis-Hastings Algorithm Part I
ERGM conditional form Much easier to calculate delta (change statistics)
Bayesian inference Presented by Amir Hadadi
Markov Chain Monte Carlo
Markov chain monte carlo
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Reverend Thomas Bayes ( )
Presentation transcript:

Molecular Systematics Maximum likelihood approaches are time consuming Bayesian approaches are similar in approach but more rapid ML attempts to find the tree that maximizes the probability of the data given a set of trees and a model Bayesian analyses attempts to find the tree that maximizes the probability of the tree given the data and model Impossible until recently – advances in computational methods (MCMC) and speed Based on Bayes’ Theorem - tells how to update or revise beliefs in light of new evidence a posteriori.

Molecular Systematics A simple example – Imagine a box of 100 dice 90% are true, 10% are biased You pick a die randomly and are asked to determine if it is true or biased With no other information you must conclude that the probability of it being biased is 0.1 What if you had additional information?

Molecular Systematics Roll the die twice  P[result | true die] = 1/62 = 1/36 = 0.0278 P[result | biased die] = 4/21 x 6/21 = 0.0544 The probability of the die being biased given this result is higher than the probability of the die being true given this result Bayes’ Theorem: P[biased die | results] = 0.18, an increase from the original 0.1 P[true die | results] = 0.82, a decrease from the original 0.9 These are the posterior probabilities that the die you chose is biased or true You have more information and are able to make a more informed decision In Bayesian phylogenetics we replace the dice with trees and attempt to maximize the posterior probability of our final tree given random permutations to a start tree P[biased die | results] = P[results | biased die] x P[biased die] P[results | biased die] x P[biased die] + P[results | true die] x P[true die] P[biased die | results] = 0.0544 x 0.1 0.0544 x 0.1 + 0.0278 x 0.9 P[tree | data] = P[data | tree] x P[tree] P[data]

Molecular Systematics The development that made Bayesian phylogenetic estimation possible is the Markov chain Monte Carlo (MCMC) method MCMC works by taking a series of steps that form a conceptual chain At each step, a new location in parameter space is proposed via random perturbation (usually a very small change) The relative posterior-probability of the new location is calculated If the new location has a higher posterior-probability density than that of the present location of the chain, the move is accepted — the proposed location becomes the next link in the chain and the cycle is repeated. If the proposed location has a lower posterior-probability density, the move will be accepted only a proportion (p) of the time (small steps downward are accepted often, whereas big leaps down are discouraged)

Molecular Systematics If the proposed location is rejected, the present location is added as the next link in the chain By repeating this procedure millions of times, a long chain of locations in parameter space is created The proportion of the time that any tree (location) is visited along the course of the chain is an approximation of the posterior probability

Molecular Systematics This method suffers from the same local optimum problem as most other hill climbing methods Bayesian analyses overcome this by running several analyses simultaneously, usually 4 These four independent chains occasionally exchange information in an effort to avoid getting trapped on less than optimal hills

Molecular Systematics A MrBayes analysis – a Metropolis-coupled MCMC (MCMCMC): Begins by proposing eight random trees (two independent sets of four chains each) For one of these sets, all four chains will randomly perturb the trees and recalculate the posterior probabilities One chain is considered ‘cold’. This is the chain whose posterior probability is actually measured

Molecular Systematics Other three chains are ‘hot’ and have a different (but similar) tree space. The difference is in the magnitude of the peak heights Because ‘drops’ are not as large on the heated chains, they are more free to explore the tree space and less likely to become trapped.

Molecular Systematics All four chains continue with occasional switching between them to avoid getting caught on particular hills Eventually each set of runs will begin to plateau and run out of changes (even chain switches) that can improve the tree (convergence) How do we know when this has been reached? That’s where the second set of chains comes in Each set should converge on ~ the same tree Average standard deviation of split frequencies is a measure of the tree similarity for each set of chains.

Molecular Systematics Once convergence has occurred, we need to generate a consensus tree Important that we don’t include any of the initial (essentially random) trees, only the ones that were obtained after the analysis reached a plateau The burn-in is the set of sampled trees that we discard in favor of the (likely) more accurate trees The consensus tree is generated from the data collected after the burn-in burn-in