Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gibbs sampling.

Similar presentations


Presentation on theme: "Gibbs sampling."— Presentation transcript:

1 Gibbs sampling

2 The Motif Finding Problem
Given a set of DNA sequences: cctgatagacgctatctggctatccacgtacgtaggtcctctgtgcgaatctatgcgtttccaaccat agtactggtgtacatttgatacgtacgtacaccggcaacctgaaacaaacgctcagaaccagaagtgc aaacgtacgtgcaccctctttcttcgtggctctggccaacgagggctgatgtataagacgaaaatttt agcctccgatgtaagtcatagctgtaactattacctgccacccctattacatcttacgtacgtataca ctgttatacaacgcgtcatggcggggtatgcgttttggtcgtcgtacgctcgatcgttaacgtacgtc Find the motif in each of the individual sequences

3 The Motif Finding Problem
If starting positions s=(s1, s2,… st) are given, finding consensus is easy because we can simply construct (and evaluate) the profile to find the motif. But… the starting positions s are usually not given. How can we find the “best” profile matrix? Gibbs sampling Expectation-Maximization algorithm

4 Notations Set of symbols: Sequences: S = {S1, S2, …, SN}
Starting positions of motifs: A = {a1, a2, …, aN} Motif model ( ) : qij = P(symbol at the i-th position = j) Background model: pj = P(symbol = j) Count of symbols in each column: cij= count of symbol, j, in the i-th column in the aligned region

5 Motif Finding Problem Problem: find starting positions and model parameters simultaneously to maximize the posterior probability: This is equivalent to maximizing the likelihood by Bayes’ Theorem, assuming uniform prior distribution:

6 Equivalent Scoring Function
Maximize the log-odds ratio:

7 Sampling and optimization
To maximize a function, f(x): Brute force method: try all possible x Sample method: sample x from probability distribution: p(x) ~ f(x) Idea: suppose xmax is argmax of f(x), then it is also argmax of p(x), thus we have a high probability of selecting xmax

8 Gibbs Sampling Idea: a joint distribution may be hard to sample from, but it may be easy to sample from the conditional distributions where all variables are fixed except one To sample from p(x1, x2, …xn), let each state of the Markov chain represent (x1, x2, …xn), the probability of moving to a state (x1, x2, …xn) is: p(xi |x1, …xi-1,xi+1,…xn). It is also called Markov Chain Monte Carlo (MCMC) method.

9 Gibbs Sampling

10 Gibbs Sampling in Motif Finding
Randomly initialize A0; Repeat: (1) randomly choose a sequence z from S; A* = At \ az; compute θt = estimator of θ given S and A*; (2) sample az according to P(az = x), which is proportional to Qx/Px; update At+1 = A* union x; Select At that maximizes F; Qx: the probability of generating x according to θt; Px: the probability of generating x according to the background model

11 Estimator of θ Given an alignment A, i.e. the starting positions of motifs, θ can be estimated by its MLE with smoothing (e.g. Dirichlet prior with parameter bj):

12 The Motif Finding Problem
Given a set of DNA sequences: cctgatagacgctatctggctatccacgtacgtaggtcctctgtgcgaatctatgcgtttccaaccat agtactggtgtacatttgatacgtacgtacaccggcaacctgaaacaaacgctcagaaccagaagtgc aaacgtacgtgcaccctctttcttcgtggctctggccaacgagggctgatgtataagacgaaaatttt agcctccgatgtaagtcatagctgtaactattacctgccacccctattacatcttacgtacgtataca ctgttatacaacgcgtcatggcggggtatgcgttttggtcgtcgtacgctcgatcgttaacgtacgtc Find the motif in each of the individual sequences

13 Gene Regulation Transcription factor binding site, or motif instances
TF Gene 1 CACGTGT CACGTGA CAAGTGA CAGGTGA Gene 2 Gene 3 Gene 4 Transcription factor binding site, or motif instances

14 Evolutionary Conservation
CACGTGACC CACGTGAAC CACGTGAAC

15 Overview of TGS Colored lines: regulatory regions of genes
How did the motifs evolve? How to find the ancestral motif instances? Colored lines: regulatory regions of genes Colored boxes: motif instances

16 How to find the ancestral motif instances?
A C G T Ancestral motif profile: CACGTGAAC CACGTGACC

17 How did the motifs evolve?
Background substitution matrix A C G T A C G T Motif substitution matrix A C G T A C G T

18 Evolution of motifs Distant species 250 million years

19 Overview of Gibbs Sampler
Implementation Overview of Gibbs Sampler Iteratively sample from conditional distribution when other parameters are fixed. draw ~ In order to draw:

20 Implementation Parameters Ancestral motif weight matrix at the root
Background distribution (multinomial) Probability that a gene in the i-th species will contain the motif Motif width Background substitution matrix for the i-th branch Motif substitution matrix for the i-th branch

21 Implementation Prior distribution Beta(1,1) Poisson distribution

22 Implementation Initialization
Parameters are sampled using prior distributions; Motif instances in current species are sampled from sequences directly for each current species; Motif instances in ancestral species are randomly assigned with one of its immediate child motif instances.

23 Implementation Motif instance updating
Updating motif instances in ancestral species Updating motif instances in current species

24 Ancestral Motif Weight Matrix
Implementation Updating motif instance in ancestral species Ancestral Motif Weight Matrix A C G T M11 M12 C A 2th position A: 0.932… C: 0.067 G: 8.4e-6 T: 2.5e-4 M11 M12 CCCGTGACC CACGTGAAC

25 Updated ancestral motif instance
Implementation Updating motif instances for current species Updated ancestral motif instance CACTTGAAC M11 M12 …CACACCACGTGAGCTT... …CACATCACGTGAACTT…

26 Multiple Species? ? CAGGTGATC CACGTGAAC CACGTGAAC CACGTGAAC CACGTGATC


Download ppt "Gibbs sampling."

Similar presentations


Ads by Google