Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.

Slides:



Advertisements
Similar presentations
Part 2: Unsupervised Learning
Advertisements

Motivating Markov Chain Monte Carlo for Multiple Target Tracking
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
Bayesian Estimation in MARK
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)
Markov Chains 1.
Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.
Markov Networks.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Ka-Lok Ng Dept. of Bioinformatics Asia University
Lecture 3: Markov processes, master equation
Graduate School of Information Sciences, Tohoku University
Bayesian statistics – MCMC techniques
Suggested readings Historical notes Markov chains MCMC details
BAYESIAN INFERENCE Sampling techniques
Gibbs sampling for motif finding in biological sequences Christopher Sheldahl.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
The Gibbs sampler Suppose f is a function from S d to S. We generate a Markov chain by consecutively drawing from (called the full conditionals). The n’th.
Transcription factor binding motifs (part I) 10/17/07.
A Very Basic Gibbs Sampler for Motif Detection Frances Tong July 28, 2004 Southern California Bioinformatics Summer Institute.
Genome evolution: a sequence-centric approach Lecture 3: From Trees to HMMs.
Today Introduction to MCMC Particle filters and MCMC
Gibbs biclustering of microarray data Yves Moreau & Qizheng Sheng Katholieke Universiteit Leuven ESAT-SCD (SISTA) on leave at Center for Biological Sequence.
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
Introduction to Bayesian statistics Yves Moreau. Overview The Cox-Jaynes axioms Bayes’ rule Probabilistic models Maximum likelihood Maximum a posteriori.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass.
Gibbs Sampler in Local Multiple Alignment Review by 온 정 헌.
Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
Improved Cross Entropy Method For Estimation Presented by: Alex & Yanna.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.
EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.
Multiple Species Gene Finding using Gibbs Sampling Sourav Chatterji Lior Pachter University of California, Berkeley.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
CS774. Markov Random Field : Theory and Application Lecture 15 Kyomin Jung KAIST Oct
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Lecture 18, CS5671 Multidimensional space “The Last Frontier” Optimization Expectation Exhaustive search Random sampling “Probabilistic random” sampling.
Motif identification with Gibbs Sampler Xuhua Xia
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
1 Discovery of Conserved Sequence Patterns Using a Stochastic Dictionary Model Authors Mayetri Gupta & Jun S. Liu Presented by Ellen Bishop 12/09/2003.
SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.
How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Advanced Statistical Computing Fall 2016
A Very Basic Gibbs Sampler for Motif Detection
Gibbs sampling.
Learning Sequence Motif Models Using Expectation Maximization (EM)
Markov Networks.
Sequential Pattern Discovery under a Markov Assumption
Hidden Markov Models Part 2: Algorithms
Haim Kaplan and Uri Zwick
Learning Sequence Motif Models Using Gibbs Sampling
Markov Networks.
Outline Texture modeling - continued Markov Random Field models
Presentation transcript:

Gibbs sampling for motif finding Yves Moreau

2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray data

3 Markov Chain Monte-Carlo Markov chain with transition matrix T A C G T A C G T X= A X= C X= G X= T

4 Markov Chain Monte-Carlo Markov chains can sample from complex distributions ACGCGGTGTGCGTTTGACGA ACGGTTACGCGACGTTTGGT ACGTGCGGTGTACGTGTACG ACGGAGTTTGCGGGACGCGT ACGCGCGTGACGTACGCGTG AGACGCGTGCGCGCGGACGC ACGGGCGTGCGCGCGTCGCG AACGCGTTTGTGTTCGGTGC ACCGCGTTTGACGTCGGTTC ACGTGACGCGTAGTTCGACG ACGTGACACGGACGTACGCG ACCGTACTCGCGTTGACACG ATACGGCGCGGCGGGCGCGG ACGTACGCGTACACGCGGGA ACGCGCGTGTTTACGACGTG ACGTCGCACGCGTCGGTGTG ACGGCGGTCGGTACACGTCG ACGTTGCGACGTGCGTGCTG ACGGAACGACGACGCGACGC ACGGCGTGTTCGCGGTGCGG A C G T % Position

5 Markov Chain Monte-Carlo Let us look at the transition after two steps Similarly, after n steps

6 Markov Chain Monte-Carlo Stationary distribution  If the samples are generated to the distribution , the samples at the next step will also be generated according to   is a left eigenvector of T Equilibrium distribution Rows of T  are stationary distributions  From an arbitrary initial condition and after a sufficient number of steps (burn-in), the successive states of the Markov chains are samples from a stationary distribution

7 Detailed balance A sufficient condition for the Markov chain to converge to the stationary distribution p is that they satisfy the condition of detailed balance Proof: Problem: disjoint regions in probability space

8 Gibbs sampling Markov chain for Gibbs sampling

9 Gibbs sampling Detailed balance Detailed balance for the Gibbs sampler Prove detailed balance Bayes’ rule Q.E.D.

10 Data augmentation Gibbs sampling Introducing unobserved variables often simplifies the expression of the likelihood A Gibbs sampler can then be set up Samples from the Gibbs sampler can be used to estimate parameters

11 Pros and cons Pros Clear probabilistic interpretation Bayesian framework “Global optimization” Cons Mathematical details not easy to work out Relatively slow

12 Motif finding

13 Gibbs sampler Gibbs sampling for motif finding Set up a Gibbs sampler for the joint probability of the motif matrix and the alignment given the sequences Sequence by sequence Lawrence et al. One motif of fixed length One occurrence per sequence Background model based on single nucleotides Too sensitive to noise Lots of parameter tuning

Translation start 500 bp

15 Gibbs motif finding Initialization Sequences Random motif matrix Iteration Sequence scoring Alignment update Motif instances Motif matrix Termination Convergence of the alignment and of the motif matrix

16 Gibbs motif finding Initialization Sequences Random motif matrix Iteration Sequence scoring Alignment update Motif instances Motif matrix Termination Convergence of the alignment and of the motif matrix

17 Gibbs motif finding Initialization Sequences Random motif matrix Iteration Sequence scoring Alignment update Motif instances Motif matrix Termination Convergence of the alignment and of the motif matrix

18 Gibbs motif finding Initialization Sequences Random motif matrix Iteration Sequence scoring Alignment update Motif instances Motif matrix Termination Convergence of the alignment and of the motif matrix

19 Gibbs motif finding Initialization Sequences Random motif matrix Iteration Sequence scoring Alignment update Motif instances Motif matrix Termination Convergence of the alignment and of the motif matrix

20 Gibbs motif finding Initialization Sequences Random motif matrix Iteration Sequence scoring Alignment update Motif instances Motif matrix Termination Convergence of the alignment and of the motif matrix

21 Gibbs motif finding Initialization Sequences Random motif matrix Iteration Sequence scoring Alignment update Motif instances Motif matrix Termination Convergence of the alignment and of the motif matrix

22 Gibbs motif finding Initialization Sequences Random motif matrix Iteration Sequence scoring Alignment update Motif instances Motif matrix Termination Stabilization of the motif matrix (not of the alignment)

23 Motif Sampler (extended Gibbs sampling) Model One motif of fixed length per round Several occurrences per sequence Sequence have a discrete probability distribution over the number of copies of the motif (under a maximum bound) Multiple motifs found in successive rounds by masking occurrences of previous motifs Improved background model based on oligonucleotides Gapped motifs

Translation start 500 bp