Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayesian inference Presented by Amir Hadadi

Similar presentations


Presentation on theme: "Bayesian inference Presented by Amir Hadadi"— Presentation transcript:

1 Bayesian inference Presented by Amir Hadadi
Based on “Bayesian inference using Markov Chain Monte Carlo in phylogenetic studies” by TorbojÖrn Karfunkel Presented by Amir Hadadi Bioinformatics seminar, spring 2005 13 נובמבר 18

2 What is Bayesian inference ?
Definition: “an approach to statistics in which all forms of uncertainty are expressed in terms of probability” (Radford M. Neal)

3 Probability reminder Conditional probability: P(D|T)P(T)=P(T|D)P(D)
P(DT) = P(D|T)P(T) P(DT) = P(T|D)P(D) P(D|T)P(T)=P(T|D)P(D) Bayes’ theorem: P(T|D) =P(D|T)P(T)/P(D) P(T|D) is called the posterior probability of T P(T) is the prior probability, that is the probability assigned to T before seeing the data P(D|T) is the likelihood of T, which is what we try to maximize in ML P(D) is the probability of observing the data D disregarding which tree is correct

4 posterior vs. likelihood probabilities Bayesian inference vs
posterior vs. likelihood probabilities Bayesian inference vs. Maximum likelihood Observation Fair Biased 1/6 1/21 1/6 2/21 1/6 3/21 100 dice some fair, some biased 1/6 4/21 1/6 5/21 1/6 6/21

5 Example continued A die is drawn at random from the box
Rolling the die twice gives us a and a Using the ML approach we get: P( |Fair) = 1/6  1/6 ≈ 0.028 P( |Biased) = 4/21  6/21 ≈ 0.054 ML Conclusion: the die is biased

6 Example continued further
Assume we have a prior knowledge about the dice distribution inside the box We know that in the box there are 90 fair dice and 10 biased dice

7 Example conclusion Prior probability: fair = 0.1, biased = 0.9
Rolling the die twice gives us a and a Using the Bayesian approach we get: P(Biased| ) = P( | Biased)P(Biased)/P( )≈0.179 B.I. Conclusion: the die is fair Conclusion: ML and BI do not necessarily agree Resemblance of BI and ML results depends on the strength of prior assumptions we introduce

8 Steps in B.I. formulate a model of the problem
Formulate a prior distribution which captures your beliefs before seeing the data Obtain posterior distribution for the model parameters

9 B.I. In phylogenetic reconstruction
Finding an evolutionary tree which explains the Data (observed species) Methods of phylogenetic reconstruction : Using a model of sequence evolution, e.g. maximum likelihood Not using sequence evolution, e.g. maximum parsimony, neighbor joining etc. Bayesian inference belongs to the first category

10 Bayesian inference vs. Maximum likelihood
The basic question in Bayesian inference: “What is the probability that this model (T) is correct, given the data (D) that we have observed ?” Maximum likelihood asks a different question: “What is the probability of seeing the observed data (D) given that a certain model (T) is true ?” B.I. seeks P(T|D), while ML maximizes P(D|T)

11 Which priors should we assume ?
Knowledge about a parameter can be used to approximate its prior distribution Usually we don’t have prior knowledge about a parameter’s distribution. In this case a flat or vague prior is assumed.

12 A flat prior A vague prior

13 How to find the posterior probability P(T|D) ?
P(T) is the assumed prior P(D|T) is the likelihood Finding P(D) is infeasible – we need to sum P(D|T)P(T) over the entire tree space Markov Chain Monte Carlo (MCMC) gives us an indirect way of finding P(T|D) without having to calculate P(D)

14 MCMC Example , , , , , , P(“Palestine”) = 3/7, P(“Tree”) = 4/7 P=1/2

15 Symmetric simple random walk
Definition: A sequence of steps in , starting at 0 and moving one step left or right with probability ½ Properties: After n steps the average distance from 0 is of magnitude n A random walk in one or two dimensions is recurrent A random walk in three dimensions or more is transient The Brownian motion is a limit of a random walk

16 Definition of a markov chain
A special type of stochastic process A sequence of random variables X0, X1, X2,… such that: Each Xi takes values in a state space S = {s1, s2,…} If x0, x1,…, xn+1 are elements of S, then: P(Xn+1 = xn+1|Xn = xn, Xn-1 = xn-1,…,X0 = x0) = P(Xn+1 = xn+1|Xn = xn)

17 Using MCMC to calculate posterior probabilities
set S = the set of parameters (e.g. tree topology, mutation probability, branch lengths etc.) Construct an MCMC with a stationary distribution equal to the posterior probability of the parameters Run the chain for a long time and sample from it regularly Use the samples to find the stationary distribution

18 Constructing our MCMC The state space S is defined as the parameter space Start with random tree and parameters In each new generation, randomly propose either: A new tree topology A new value for a model parameter If the proposed tree has higher posterior probability, proposed, than the current tree, current, the transition is accepted Otherwise the transition is accepted with probability proposed / current

19 Algorithm visualization

20 Convergence issues An MCMC might run for a long time until its sampled distribution is close to the stationary distribution The initial convergence phase is called the “burn-in” phase We wish to minimize burn-in time

21 Avoiding getting stuck on local maxima
Assume our landscape looks like this: Big drop small drop

22 Avoiding local maxima (cont’d)
descending a maximum can take a long time MCMCMC (Metropolis coupled MCMC) speeds the chain’s “mixing” rate Instead of running a single chain, multiple chains are run simultaneously The chains are heated to different degrees

23 The cold chain has stationary distribution P(T|D)
Chain heating The cold chain has stationary distribution P(T|D) Heated chain number i has Stationary distribution P(T|D)1/i

24 The MC3 algorithm Run multiple heated chains
At each generation, attempt a swap between two chains If the swap is accepted, the hotter and cooler chains will swap states sample only from the cold chain

25 Drawing conclusions To Decide the value of a parameter:
Draw a histogram showing the number of trees in each interval and calculate mean, mode, credibility intervals etc. To find the most likely tree topologies: sort all sampled trees according to their posterior probabilities Pick the most probable trees until the cumulative probability is 0.95 To Check whether a certain group of organisms is monophyletic: Find the number of sampled trees in which it is monophyletic If it is monophyletic in 74% of the trees, it has a 74% probability of being monophyletic

26 Summary Bayesian inference is very popular in many fields requiring statistical observations The advent of fast computers gave rise to the use of MCMC in B.I., enabling multi-parameter analysis Fields of genomics using Bayesian methods: Identification of SNP’s Inferring levels of gene expression and regulation Association mapping Etc.

27 THE END


Download ppt "Bayesian inference Presented by Amir Hadadi"

Similar presentations


Ads by Google