Bayesian inference Presented by Amir Hadadi

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Motivating Markov Chain Monte Carlo for Multiple Target Tracking
Bayesian Estimation in MARK
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Markov Chains 1.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Bayesian statistics – MCMC techniques
BAYESIAN INFERENCE Sampling techniques
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Bayesian inference calculate the model parameters that produce a distribution that gives the observed data the greatest probability.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Probabilistic methods for phylogenetic trees (Part 2)
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
BINF6201/8201 Molecular phylogenetic methods
Model Inference and Averaging
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
2 nd Order CFA Byrne Chapter 5. 2 nd Order Models The idea of a 2 nd order model (sometimes called a bi-factor model) is: – You have some latent variables.
BINF6201/8201 Molecular phylogenetic methods
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU.
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Molecular Systematics
Bayesian Phylogenetics. Bayes Theorem Pr(Tree|Data) = Pr(Data|Tree) x Pr(Tree) Pr(Data)
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
MCMC reconstruction of the 2 HE cascade events Dmitry Chirkin, UW Madison.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
Sampling and estimation Petter Mostad
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Bayesian statistics named after the Reverend Mr Bayes based on the concept that you can estimate the statistical properties of a system after measuting.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Bayesian II Spring Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have a large enough sample of the posterior?
Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016:
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Markov Chain Monte Carlo in R
Bayesian Neural Networks
MCMC Output & Metropolis-Hastings Algorithm Part I
MCMC Stopping and Variance Estimation: Idea here is to first use multiple Chains from different initial conditions to determine a burn-in period so the.
Availability Availability - A(t)
Advanced Statistical Computing Fall 2016
ERGM conditional form Much easier to calculate delta (change statistics)
Gibbs sampling.
Model Inference and Averaging
Markov chain monte carlo
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Markov Networks.
Discrete Event Simulation - 4
Multidimensional Integration Part I
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Ch13 Empirical Methods.
Class #19 – Tuesday, November 3
Slides for Sampling from Posterior of Shape
The Most General Markov Substitution Model on an Unrooted Tree
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Markov Networks.
Mathematical Foundations of BME Reza Shadmehr
Computing and Statistical Data Analysis / Stat 10
Presentation transcript:

Bayesian inference Presented by Amir Hadadi Based on “Bayesian inference using Markov Chain Monte Carlo in phylogenetic studies” by TorbojÖrn Karfunkel Presented by Amir Hadadi Bioinformatics seminar, spring 2005 13 נובמבר 18

What is Bayesian inference ? Definition: “an approach to statistics in which all forms of uncertainty are expressed in terms of probability” (Radford M. Neal)

Probability reminder Conditional probability: P(D|T)P(T)=P(T|D)P(D) P(DT) = P(D|T)P(T) P(DT) = P(T|D)P(D) P(D|T)P(T)=P(T|D)P(D) Bayes’ theorem: P(T|D) =P(D|T)P(T)/P(D) P(T|D) is called the posterior probability of T P(T) is the prior probability, that is the probability assigned to T before seeing the data P(D|T) is the likelihood of T, which is what we try to maximize in ML P(D) is the probability of observing the data D disregarding which tree is correct

posterior vs. likelihood probabilities Bayesian inference vs posterior vs. likelihood probabilities Bayesian inference vs. Maximum likelihood Observation Fair Biased 1/6 1/21 1/6 2/21 1/6 3/21 100 dice some fair, some biased 1/6 4/21 1/6 5/21 1/6 6/21

Example continued A die is drawn at random from the box Rolling the die twice gives us a and a Using the ML approach we get: P( |Fair) = 1/6  1/6 ≈ 0.028 P( |Biased) = 4/21  6/21 ≈ 0.054 ML Conclusion: the die is biased

Example continued further Assume we have a prior knowledge about the dice distribution inside the box We know that in the box there are 90 fair dice and 10 biased dice

Example conclusion Prior probability: fair = 0.1, biased = 0.9 Rolling the die twice gives us a and a Using the Bayesian approach we get: P(Biased| ) = P( | Biased)P(Biased)/P( )≈0.179 B.I. Conclusion: the die is fair Conclusion: ML and BI do not necessarily agree Resemblance of BI and ML results depends on the strength of prior assumptions we introduce

Steps in B.I. formulate a model of the problem Formulate a prior distribution which captures your beliefs before seeing the data Obtain posterior distribution for the model parameters

B.I. In phylogenetic reconstruction Finding an evolutionary tree which explains the Data (observed species) Methods of phylogenetic reconstruction : Using a model of sequence evolution, e.g. maximum likelihood Not using sequence evolution, e.g. maximum parsimony, neighbor joining etc. Bayesian inference belongs to the first category

Bayesian inference vs. Maximum likelihood The basic question in Bayesian inference: “What is the probability that this model (T) is correct, given the data (D) that we have observed ?” Maximum likelihood asks a different question: “What is the probability of seeing the observed data (D) given that a certain model (T) is true ?” B.I. seeks P(T|D), while ML maximizes P(D|T)

Which priors should we assume ? Knowledge about a parameter can be used to approximate its prior distribution Usually we don’t have prior knowledge about a parameter’s distribution. In this case a flat or vague prior is assumed.

A flat prior A vague prior

How to find the posterior probability P(T|D) ? P(T) is the assumed prior P(D|T) is the likelihood Finding P(D) is infeasible – we need to sum P(D|T)P(T) over the entire tree space Markov Chain Monte Carlo (MCMC) gives us an indirect way of finding P(T|D) without having to calculate P(D)

MCMC Example , , , , , , P(“Palestine”) = 3/7, P(“Tree”) = 4/7 P=1/2

Symmetric simple random walk Definition: A sequence of steps in , starting at 0 and moving one step left or right with probability ½ Properties: After n steps the average distance from 0 is of magnitude n A random walk in one or two dimensions is recurrent A random walk in three dimensions or more is transient The Brownian motion is a limit of a random walk

Definition of a markov chain A special type of stochastic process A sequence of random variables X0, X1, X2,… such that: Each Xi takes values in a state space S = {s1, s2,…} If x0, x1,…, xn+1 are elements of S, then: P(Xn+1 = xn+1|Xn = xn, Xn-1 = xn-1,…,X0 = x0) = P(Xn+1 = xn+1|Xn = xn)

Using MCMC to calculate posterior probabilities set S = the set of parameters (e.g. tree topology, mutation probability, branch lengths etc.) Construct an MCMC with a stationary distribution equal to the posterior probability of the parameters Run the chain for a long time and sample from it regularly Use the samples to find the stationary distribution

Constructing our MCMC The state space S is defined as the parameter space Start with random tree and parameters In each new generation, randomly propose either: A new tree topology A new value for a model parameter If the proposed tree has higher posterior probability, proposed, than the current tree, current, the transition is accepted Otherwise the transition is accepted with probability proposed / current

Algorithm visualization

Convergence issues An MCMC might run for a long time until its sampled distribution is close to the stationary distribution The initial convergence phase is called the “burn-in” phase We wish to minimize burn-in time

Avoiding getting stuck on local maxima Assume our landscape looks like this: Big drop small drop

Avoiding local maxima (cont’d) descending a maximum can take a long time MCMCMC (Metropolis coupled MCMC) speeds the chain’s “mixing” rate Instead of running a single chain, multiple chains are run simultaneously The chains are heated to different degrees

The cold chain has stationary distribution P(T|D) Chain heating The cold chain has stationary distribution P(T|D) Heated chain number i has Stationary distribution P(T|D)1/i

The MC3 algorithm Run multiple heated chains At each generation, attempt a swap between two chains If the swap is accepted, the hotter and cooler chains will swap states sample only from the cold chain

Drawing conclusions To Decide the value of a parameter: Draw a histogram showing the number of trees in each interval and calculate mean, mode, credibility intervals etc. To find the most likely tree topologies: sort all sampled trees according to their posterior probabilities Pick the most probable trees until the cumulative probability is 0.95 To Check whether a certain group of organisms is monophyletic: Find the number of sampled trees in which it is monophyletic If it is monophyletic in 74% of the trees, it has a 74% probability of being monophyletic

Summary Bayesian inference is very popular in many fields requiring statistical observations The advent of fast computers gave rise to the use of MCMC in B.I., enabling multi-parameter analysis Fields of genomics using Bayesian methods: Identification of SNP’s Inferring levels of gene expression and regulation Association mapping Etc.

THE END