Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gibbs Sampler in Local Multiple Alignment Review by 온 정 헌.

Similar presentations


Presentation on theme: "Gibbs Sampler in Local Multiple Alignment Review by 온 정 헌."— Presentation transcript:

1 Gibbs Sampler in Local Multiple Alignment Review by 온 정 헌

2 Topic 하나. Gibbs Sampler algorithm in Multiple Sequence Alignment( 기전 설명 ) (Lawrence et al., Science 1993; J. Liu et al. JASA, 1995) 둘. Brief Review of Bayesian missing data problem and Gibbs Sampler ( 개발된 배경 …)

3 Aim of Local MSA 기능상 중요한 서열들은 보통 보존되어있 다. (Conserved Sequence) Locate relatively short patterns shared by otherwise dissimilar sequences 예. 1. Regulon 들의 upstream sequence 에 서 Regulatory Motif 찾기 2. Protein 서열의 alignment 를 통해 interaction motif 찾기

4 Local MSA method EM(MEME) http://meme.sdsc.edu/meme/website/ Gibbs sampler (AlignACE, Gibbs motif sampler) http://bayesweb.wadsworth.org/gibbs/gibbs.html HMM(HMMER) http://hmmer.wustl.edu/ MACAW ftp://ncbi.nlm.nih.gov/pub/macaw

5 Gibbs Sampler Algorithm In Practice (Predictive Update version of Gibbs Sampler)

6 Problem Description Given a set of N sequences S 1,…,S N of length n k (k=1,…,N) Identify a single pattern of fixed width(W) within each (N)input sequence A= {a k } (k=1,…,N) : a set of starting positions for the common pattern within each sequence ; a k =1…n k - W+1 Objective: to find the “best,” defined as the most probable, common pattern

7 Algorithm- Initialization (1) Choose random starting positions { a k } within the various sequences A= {a k } (k=1,…,N) : a set of starting positions for the common pattern within each sequence ; a k =1…n k -W+1

8 Algorithm- Predictive Update (2) One of the N sequences, Z, is chosen either at random or in specified order. The pattern description q ij and background frequency q 0j are then calculated excluding z.

9 q ij (i=1,…,W, j= A,T,G,C) c ij = the count of base j in this position b j = residue-dependent “pseudocounts” B = the sum of the bj q 0j : calculated analogously with counts taken over all non-motif positions q ij = c ij + b j N-1+B

10 XXXXXXXXXXXXX  Z XXXXXXMXXXX XXXXMXXXXXXX XXXXXXMXXX XXXXXXXXMX XXMXXXXXXX AGGAGCAAGA ACATCCAAGT TCATGATAGT TGTAATGTCA AATGTGGTCA N=6, W=10 q 1A = 3/5, q 2G = 2/5, …( 편의상 pseudocount 제외 ) q 1G = 0 ( 문제, pseudocount 꼭 필요 )

11 Q 4*W = q 1A q 2A …… q WA q 1T q 2T ……. q WT q 1G q 2G ……. q WG q 1C q 2C …… q WC Q 0 4*1 = q 0A q 0T q 0G q 0C A= {a 1,a 2,…,a k } Resulting parameters

12 Algorithm- Sampling step (3) Every possible segment of width(W) within sequence z is considered The probability Qx (Bx) of generating each segment x according to qij (q0j) are calculated The weight Ax = Qx/Bx is assigned to segment x, and with each segment so weighted, a random one is selected. Its position then becomes the new a z Iterations

13 XXXXXXXXXXXX  Z XXXXXMXXXX XXMXXXXXXX XXXXXXMXXX XXXXXXXXMX XXMXXXXXXX ATGGCTAAGCCATTAATCGC q 3G X q 4G X q 5C X q 6T X q 7A X q 8A X q 9G X q 10C X q 11C X q 12A q 0G x q 0G x q 0C x q 0T x q 0A x q 0A x q 0G x q 0C x q 0C x q 0A A X = Qx/Bx = Select a set of a k ’s that maximizes the product of these ratios, or F F = Σ 1≤i≤W Σ j ∈ {A,T,G,C} c i,j log(q ij /q 0j )

14 Bayesian missing data problem and Gibbs Sampler 중간 Review

15 Simulation Joint pdf f(x,y,z) 가 주어져있을 때 EX 를 어떻게 구할까 ? 원래는 f(x)=∫∫f(x,y,z)dzdy, EX= ∫xf(x)dx 그런데, f(x)=∫∫f(x,y,z)dzdy 를 계산하기 어렵다 면 ?…Simulation(sampling) x 1,x 2,x 3,….x 1000 을 생성시켜 1/m∑x i = EX 으로 approximation

16 MCMC High-dimensional joint densities are completely characterized by lower-dimensional conditional densities. Or, a big/hard problem can be broken down into a interrelated series of similar/easier problems. P(x 1,x 2,…,x n )= P(x n /x n-1,..., x 1 )P(x n-1 /x n-2,…,x 1 ) …P(x 2 /x 1 )P(x 1 ) = P(x n /x n-1 )P(x n-1 /x n-2 ) …P(x 2 /x 1 )P(x 1 ) (Markov Chain) Metropolis-Hastings algorithm, Gibbs Sampler 등 … Markov chain 을 구성하여 문제를 푼다. ~ Simulate a Markov chain that converges in distribution to a posterior distribution

17 Gibbs Sampler (two-component) (X,Y) 에서 sample 을 하고 싶다면 … Choose Y 0, t=0; Generate X t ~ f(x/y t ); Generate Y t+1 ~f(y/x t ); t = t+1; iterate (Y 0, X 0 ) (Y 1, X 1 ) (Y 2, X 2 ),…,(Y k, X k ),… … π(Y, X)~ invariant(stationary) distribution

18 2-dimension(x 1,x 2 ) 의 경우

19 Gibbs Sampler

20

21 Bayes theorem Posterior ∝ Likelihood X Prior 잠시 ~~

22 Bayesian missing data problem Θ: parameter of interest X={x 1,…,x N }: a set of complete i.i.d. observations from a density that depends upon θ: π(X ┃ θ) π(θ ┃ X) = Π i=1,…,n π(x i ┃ θ) π(θ) / π(X) In practical situations, x i may not be completely observed. Assuming the unobserved values are missing completely at random, let X=(Y,Z), x i =(y i,z i ) i=1,…,n y i : observed part, z i =missing part π(θ ┃ Y) = ∫ π(θ ┃ Y, Z) π(Z ┃ Y)dZ  Imputation

23 Multiple values, Z (1),…,Z (m) are drawn from π(Z ┃ Y) to form m complete data sets. With these imputed data sets and the ergodicity theorem, π(θ ┃ Y) ≈ 1/m*{π(θ ┃ Y, Z (1) )+ … + π(θ ┃ Y, Z (m) )} But in most applied problems it is impossible to draw Z from (Z ┃ Y) directly.

24  Tanner and Wong’s data augmentation(DA) which applied Gibbs Sampler to draw multiples of θ’s and multiples of Z’s jointly from π(θ, Z ┃ Y), manages to cope with the problem by evolving a Markov chain. By iterating between drawing θ from π(θ ┃ Y,Z) and drawing Z from π(Z ┃ θ,Y),DA constructs a Markov chain whose equilibrium distribution is π(θ, Z ┃ Y)

25 Collapsed Gibbs Sampler(J. Liu) Consider Sampling from π(θ ┃ D), θ=(θ 1,θ 2,θ 3 ) Original Gibbs Sampler Collapsed Gibbs Sampler (J. Liu): 계산 용이

26 Back to Multiple Sequence Alignment Bayesian missing data problem 과 어떤 관계가 …?

27 Q 4*W = q 1A q 2A …… q WA q 1T q 2T ……. q WT :Parameter of Interest q 1G q 2G ……. q WG q 1C q 2C …… q WC Q 0 4*1 = q 0A q 0T q 0G q 0C A= {a 1,a 2,…,a k } : Missing Data! B= Given Sequences : Observed Data! Resulting parameters

28 Revisiting… By iterating between drawing Q from π(Q ┃ A, B) and drawing Z from π(A ┃ Q, B), DA constructs a Markov chain whose equilibrium distribution is π(Q, A ┃ B)  Collapsed Gibbs Sampler: π(A ┃ B)

29 Original Gibbs Sampler algorithm Step0. choose an arbitrary starting point A 0 =(a 1,0,a 2,0,…,a N,0,Q 0 ); Step2. Generate A t+1 =(a 1,t+1,a 2,t+1,…,a N,t+1,Q t+1 ) as follows: Generate a 1,t+1 ~π(a 1 ┃ a 2,t,…,a N,t, Q t, B); Generate a 2,t+1 ~π(a 2 ┃ a 1,t+1, a 3,t …,a N,t, Q t, B); … Generate a N,t+1 ~π(a N ┃ a 1,t+1, a 2,t+1 …,a N-1,t+1, Q t, B); Generate Q t+1 ~π(a N ┃ a 1,t+1, a 2,t+1 …,a N,t+1, B); Step3. Set t=t+1, and go to step 1

30 Collapsed Gibbs Sampler Step0. choose an arbitrary starting point A 0 =(a 1,0,a 2,0,…,a N,0 ); Step2. Generate A t+1 =(a 1,t+1,a 2,t+1,…,a N,t+1 ) as follows: Generate a 1,t+1 ~ π(a 1 ┃ a 2,t,…,a N,t, B); Generate a 2,t+1 ~ π(a 2 ┃ a 1,t+1, a 3,t …,a N,t, B); … Generate a N,t+1 ~ π(a N ┃ a 1,t+1, a 2,t+1 …,a N-1,t+1, B); Generate Q t+1 ~ π(Q ┃ a 1,t+1, a 2,t+1 …,a N,t+1, B); Step3. Set t=t+1, and go to step 1

31 Predictive Update Version? Predictive distribution π(A ┃ B) A [-k] ={ a 1,a 2,...,a k-1, a k+1,…, a N }, B: Sequences π(ak=i ┃ A [-k],B) … 계산 (Q 에 관한 적분 …) … ∝ Л 1≤i≤W (qij/q0j) <-- why we calculated Ax

32 Phase shift To avoid this situation, after every Mth iteration, for example, one may compare the current set of ak with sets shifted left and right by up to a certain number of letters. Probability ratios may be calculated for all probabilities, and a random selection is made among them with appropriate corresponding weights.

33 Convergence 양상

34 AlignACE 에 포함된 기능 -Automatic detection of variable pattern widths -Multiple motif instances per input sequence -Both strands are now considered -Near-optimum sampling method was improved -Model for base background frequencies was fixed to the background nucleotide frequencies in the genome being considered

35 종 합종 합 Lawrence, Liu, Neuwald, Collapsed Gibbs Sampler algorithm in Multiple Sequence Alignment 1993,1994,1995 최근 …Gibbs Sampler 를 응용한 motif search program 들 다수 (Gibbs Motif Sampler, AlignACE,) Dempster, EM algorithm,1977 Tanner & Wong, Data Augmentation 에 의한 사후 확률 계산, 1987, JASA (Gibbs Sampler 이용 )


Download ppt "Gibbs Sampler in Local Multiple Alignment Review by 온 정 헌."

Similar presentations


Ads by Google