Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU.

Similar presentations


Presentation on theme: "Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU."— Presentation transcript:

1 Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU

2 Topics Phylogenetics Bayesian inference and MCMC: overview Bayesian model testing MrBayesian tutorial and application – Nexus file – Configuration of the process – How to execute the process – analyzing the results

3 Phylogenetics Greek: phylum + genesis Broad definition: historical term, how the species evolve and fall Narrow definition: infer relationship of the extant We prefer the narrow one

4 Infer relationships among three species: Outgroup:

5 Three possible trees (topologies): A B C

6 A B C Prior distribution probability 1.0 Posterior distribution probability 1.0 Data (observations)

7 What is needed for inference? A probabilistic model of evolution Prior distribution on the parameters of the model Data A method for calculating the posterior distribution for the model, prior distribution and data

8 What is needed for inference? A probabilistic model of evolution Prior distribution on the parameters of the model Data A method for calculating the posterior distribution for the model, prior distribution and data

9 Model: topology + branch lengths Parameters topology branch lengths A B C D (expected amount of change)

10 Model: molecular evolution Parameters instantaneous rate matrix (Jukes-Cantor)

11 What is needed for inference? A probabilistic model of evolution Prior distribution on the parameters of the model Data A method for calculating the posterior distribution for the model, prior distribution and data

12 Priors on parameters Topology – All unique topologies have equal probabilities Branch lengths – Exponential prior puts more weight on small branch lengths; appr. uniform on transition probabilities

13 What is needed for inference? A probabilistic model of evolution Prior distribution on the parameters of the model Data A method for calculating the posterior distribution for the model, prior distribution and data

14 Data The data (alignment) Taxon Characters A ACG TTA TTA AAT TGT CCT CTT TTC AGA B ACG TGT TTC GAT CGT CCT CTT TTC AGA C ACG TGT TTA GAC CGA CCT CGG TTA AGG D ACA GGA TTA GAT CGT CCG CTT TTC AGA

15 What is needed for inference? A probabilistic model of evolution Prior distribution on the parameters of the model Data A method for calculating the posterior distribution for the model, prior distribution and data

16 Bayes’ Theorem Posterior distribution Prior distribution Likelihood function Normalizing Constant

17 tree 1tree 2 tree 3 Posterior probability distribution Parameter space (high-dimension  1d) Posterior probability

18 tree 1tree 2 tree 3 20% 48%32% We can focus on any parameter of interest (there are no nuisance parameters) by marginalizing the posterior over the other parameters (integrating out the uncertainty in the other parameters) (Percentages denote marginal probability distribution on trees)

19 joint probabilities marginal probabilities Marginal probabilities trees branch length vectors

20 How to estimate the posterior? Analytical calculation? Impossible!!!  except for very simple examples Random sampling of parameter space? Impossible too!!!  computational infeasible Dependent sampling using MCMC technique? Yes, you got it!

21 Metropolis-Hastings Sampling Assume that the current state has parameter values  Consider a move to a state with parameter values   according to proposal density q Accept the move with probability (prior ratio x likelihood ratio x proposal ratio)

22 Sampling Principles For a complex model, you typically have many “proposal” or “update” mechanisms (“moves”) Each mechanism changes one or a few parameters At each step (generation of the chain) one mechanism is chosen randomly according to some predetermined probability distribution It makes sense to try changing ‘more difficult’ parameters (such as topology in a phylogenetic analysis) more often

23 Analysis of 85 insect taxa based on 18S rDNA Application example

24 Model parameters 1 General Time Reversible (GTR) substitution model A B C D topology branch lengths

25 Model parameters 2 Gamma-shaped rate variation across sites

26 Priors on parameters Topology – all unique topologies have equal probability Branch lengths – exponential prior (exp(10) means that expected mean is 0.1 (1/10)) State Frequencies – Dirichlet prior: Dir(1,1,1,1) Rates (revmat) – Dirichlet prior: Dir(1,1,1,1,1,1) Shape of gamma-distribution of rates – Uniform prior: Uni(0,100)

27

28 burn-in stationary phase sampled with thinning (rapid mixing essential)

29 Majority rule consensus tree from sampled trees Frequencies represent the posterior probability of the clades Probability of clade being true given data, model, and prior (and given that the MCMC sample is OK)

30 Mean and 95% credibility interval for model parameters

31 MrBayes tutorial Introduction/examples

32 Nexus format input file Input: nexus format; accurately, nexus(ish)

33 … …

34

35

36 Running MrBayes  Use execute to bring data in a Nexus file into MrBayes  Set the model and priors using lset and prset  Run the chain using mcmc  Summarize the parameter samples using sump  Summarize the tree samples using sumt Note that MrBayes 3.1 runs two independent analyses by default

37 Convergence Diagnostics By default performs two independent analyses starting from different random trees (mcmc nruns=2) Average standard deviation of clade frequencies calculated and presented during the run (mcmc mcmcdiagn=yes diagnfreq=1000) and written to file (.mcmc) Standard deviation of each clade frequency and potential scale reduction for branch lengths calculated with sumt Potential scale reduction calculated for all substitution model parameters with sump

38 Bayes’ theorem Marginal likelihood (of the model) We have implicitly conditioned on a model:

39 Bayesian Model Choice Posterior model odds: Bayes factor:

40 Bayesian Model Choice The normalizing constant in Bayes’ theorem, the marginal probability of the model, f(X) or f(X|M), can be used for model choice f(X|M) can be estimated by taking the harmonic mean of the likelihood values from the MCMC run (MrBayes will do this automatically with ‘sump’) Any models can be compared: nested, non-nested, data-derived No correction for number of parameters Can prefer a simpler model over a more complex mode

41 Bayes Factor Comparisons Interpretation of the Bayes factor 2ln(B 10 )B 10 Evidence against M 0 0 to 21 to 3 Not worth more than a bare mention 2 to 63 to 20Positive 6 to 1020 to 150Strong > 10> 150Very strong

42

43

44

45

46

47

48

49


Download ppt "Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU."

Similar presentations


Ads by Google