Gibbs Sampler in Local Multiple Alignment Review by 온 정 헌.

Gibbs Sampler in Local Multiple Alignment Review by 온 정 헌

Topic 하나. Gibbs Sampler algorithm in Multiple Sequence Alignment( 기전 설명 ) (Lawrence et al., Science 1993; J. Liu et al. JASA, 1995) 둘. Brief Review of Bayesian missing data problem and Gibbs Sampler ( 개발된 배경 …)

Aim of Local MSA 기능상 중요한 서열들은 보통 보존되어있 다. (Conserved Sequence) Locate relatively short patterns shared by otherwise dissimilar sequences 예. 1. Regulon 들의 upstream sequence 에 서 Regulatory Motif 찾기 2. Protein 서열의 alignment 를 통해 interaction motif 찾기

Local MSA method EM(MEME) http://meme.sdsc.edu/meme/website/ Gibbs sampler (AlignACE, Gibbs motif sampler) http://bayesweb.wadsworth.org/gibbs/gibbs.html HMM(HMMER) http://hmmer.wustl.edu/ MACAW ftp://ncbi.nlm.nih.gov/pub/macaw

Gibbs Sampler Algorithm In Practice (Predictive Update version of Gibbs Sampler)

Problem Description Given a set of N sequences S 1,…,S N of length n k (k=1,…,N) Identify a single pattern of fixed width(W) within each (N)input sequence A= {a k } (k=1,…,N) : a set of starting positions for the common pattern within each sequence ; a k =1…n k - W+1 Objective: to find the “best,” defined as the most probable, common pattern

Algorithm- Initialization (1) Choose random starting positions { a k } within the various sequences A= {a k } (k=1,…,N) : a set of starting positions for the common pattern within each sequence ; a k =1…n k -W+1

Algorithm- Predictive Update (2) One of the N sequences, Z, is chosen either at random or in specified order. The pattern description q ij and background frequency q 0j are then calculated excluding z.

q ij (i=1,…,W, j= A,T,G,C) c ij = the count of base j in this position b j = residue-dependent “pseudocounts” B = the sum of the bj q 0j : calculated analogously with counts taken over all non-motif positions q ij = c ij + b j N-1+B

XXXXXXXXXXXXX  Z XXXXXXMXXXX XXXXMXXXXXXX XXXXXXMXXX XXXXXXXXMX XXMXXXXXXX AGGAGCAAGA ACATCCAAGT TCATGATAGT TGTAATGTCA AATGTGGTCA N=6, W=10 q 1A = 3/5, q 2G = 2/5, …( 편의상 pseudocount 제외 ) q 1G = 0 ( 문제, pseudocount 꼭 필요 )

Q 4*W = q 1A q 2A …… q WA q 1T q 2T ……. q WT q 1G q 2G ……. q WG q 1C q 2C …… q WC Q 0 4*1 = q 0A q 0T q 0G q 0C A= {a 1,a 2,…,a k } Resulting parameters

Algorithm- Sampling step (3) Every possible segment of width(W) within sequence z is considered The probability Qx (Bx) of generating each segment x according to qij (q0j) are calculated The weight Ax = Qx/Bx is assigned to segment x, and with each segment so weighted, a random one is selected. Its position then becomes the new a z Iterations

XXXXXXXXXXXX  Z XXXXXMXXXX XXMXXXXXXX XXXXXXMXXX XXXXXXXXMX XXMXXXXXXX ATGGCTAAGCCATTAATCGC q 3G X q 4G X q 5C X q 6T X q 7A X q 8A X q 9G X q 10C X q 11C X q 12A q 0G x q 0G x q 0C x q 0T x q 0A x q 0A x q 0G x q 0C x q 0C x q 0A A X = Qx/Bx = Select a set of a k ’s that maximizes the product of these ratios, or F F = Σ 1≤i≤W Σ j ∈ {A,T,G,C} c i,j log(q ij /q 0j )

Bayesian missing data problem and Gibbs Sampler 중간 Review

Simulation Joint pdf f(x,y,z) 가 주어져있을 때 EX 를 어떻게 구할까 ? 원래는 f(x)=∫∫f(x,y,z)dzdy, EX= ∫xf(x)dx 그런데, f(x)=∫∫f(x,y,z)dzdy 를 계산하기 어렵다 면 ?…Simulation(sampling) x 1,x 2,x 3,….x 1000 을 생성시켜 1/m∑x i = EX 으로 approximation

MCMC High-dimensional joint densities are completely characterized by lower-dimensional conditional densities. Or, a big/hard problem can be broken down into a interrelated series of similar/easier problems. P(x 1,x 2,…,x n )= P(x n /x n-1,..., x 1 )P(x n-1 /x n-2,…,x 1 ) …P(x 2 /x 1 )P(x 1 ) = P(x n /x n-1 )P(x n-1 /x n-2 ) …P(x 2 /x 1 )P(x 1 ) (Markov Chain) Metropolis-Hastings algorithm, Gibbs Sampler 등 … Markov chain 을 구성하여 문제를 푼다. ~ Simulate a Markov chain that converges in distribution to a posterior distribution

Gibbs Sampler (two-component) (X,Y) 에서 sample 을 하고 싶다면 … Choose Y 0, t=0; Generate X t ~ f(x/y t ); Generate Y t+1 ~f(y/x t ); t = t+1; iterate (Y 0, X 0 ) (Y 1, X 1 ) (Y 2, X 2 ),…,(Y k, X k ),… … π(Y, X)~ invariant(stationary) distribution

2-dimension(x 1,x 2 ) 의 경우

Gibbs Sampler

Bayes theorem Posterior ∝ Likelihood X Prior 잠시 ~~

Bayesian missing data problem Θ: parameter of interest X={x 1,…,x N }: a set of complete i.i.d. observations from a density that depends upon θ: π(X ┃ θ) π(θ ┃ X) = Π i=1,…,n π(x i ┃ θ) π(θ) / π(X) In practical situations, x i may not be completely observed. Assuming the unobserved values are missing completely at random, let X=(Y,Z), x i =(y i,z i ) i=1,…,n y i : observed part, z i =missing part π(θ ┃ Y) = ∫ π(θ ┃ Y, Z) π(Z ┃ Y)dZ  Imputation

Multiple values, Z (1),…,Z (m) are drawn from π(Z ┃ Y) to form m complete data sets. With these imputed data sets and the ergodicity theorem, π(θ ┃ Y) ≈ 1/m*{π(θ ┃ Y, Z (1) )+ … + π(θ ┃ Y, Z (m) )} But in most applied problems it is impossible to draw Z from (Z ┃ Y) directly.

 Tanner and Wong’s data augmentation(DA) which applied Gibbs Sampler to draw multiples of θ’s and multiples of Z’s jointly from π(θ, Z ┃ Y), manages to cope with the problem by evolving a Markov chain. By iterating between drawing θ from π(θ ┃ Y,Z) and drawing Z from π(Z ┃ θ,Y),DA constructs a Markov chain whose equilibrium distribution is π(θ, Z ┃ Y)

Collapsed Gibbs Sampler(J. Liu) Consider Sampling from π(θ ┃ D), θ=(θ 1,θ 2,θ 3 ) Original Gibbs Sampler Collapsed Gibbs Sampler (J. Liu): 계산 용이

Back to Multiple Sequence Alignment Bayesian missing data problem 과 어떤 관계가 …?

Q 4*W = q 1A q 2A …… q WA q 1T q 2T ……. q WT :Parameter of Interest q 1G q 2G ……. q WG q 1C q 2C …… q WC Q 0 4*1 = q 0A q 0T q 0G q 0C A= {a 1,a 2,…,a k } : Missing Data! B= Given Sequences : Observed Data! Resulting parameters

Revisiting… By iterating between drawing Q from π(Q ┃ A, B) and drawing Z from π(A ┃ Q, B), DA constructs a Markov chain whose equilibrium distribution is π(Q, A ┃ B)  Collapsed Gibbs Sampler: π(A ┃ B)

Original Gibbs Sampler algorithm Step0. choose an arbitrary starting point A 0 =(a 1,0,a 2,0,…,a N,0,Q 0 ); Step2. Generate A t+1 =(a 1,t+1,a 2,t+1,…,a N,t+1,Q t+1 ) as follows: Generate a 1,t+1 ~π(a 1 ┃ a 2,t,…,a N,t, Q t, B); Generate a 2,t+1 ~π(a 2 ┃ a 1,t+1, a 3,t …,a N,t, Q t, B); … Generate a N,t+1 ~π(a N ┃ a 1,t+1, a 2,t+1 …,a N-1,t+1, Q t, B); Generate Q t+1 ~π(a N ┃ a 1,t+1, a 2,t+1 …,a N,t+1, B); Step3. Set t=t+1, and go to step 1

Collapsed Gibbs Sampler Step0. choose an arbitrary starting point A 0 =(a 1,0,a 2,0,…,a N,0 ); Step2. Generate A t+1 =(a 1,t+1,a 2,t+1,…,a N,t+1 ) as follows: Generate a 1,t+1 ~ π(a 1 ┃ a 2,t,…,a N,t, B); Generate a 2,t+1 ~ π(a 2 ┃ a 1,t+1, a 3,t …,a N,t, B); … Generate a N,t+1 ~ π(a N ┃ a 1,t+1, a 2,t+1 …,a N-1,t+1, B); Generate Q t+1 ~ π(Q ┃ a 1,t+1, a 2,t+1 …,a N,t+1, B); Step3. Set t=t+1, and go to step 1

Predictive Update Version? Predictive distribution π(A ┃ B) A [-k] ={ a 1,a 2,...,a k-1, a k+1,…, a N }, B: Sequences π(ak=i ┃ A [-k],B) … 계산 (Q 에 관한 적분 …) … ∝ Л 1≤i≤W (qij/q0j) <-- why we calculated Ax

Phase shift To avoid this situation, after every Mth iteration, for example, one may compare the current set of ak with sets shifted left and right by up to a certain number of letters. Probability ratios may be calculated for all probabilities, and a random selection is made among them with appropriate corresponding weights.

Convergence 양상

AlignACE 에 포함된 기능 -Automatic detection of variable pattern widths -Multiple motif instances per input sequence -Both strands are now considered -Near-optimum sampling method was improved -Model for base background frequencies was fixed to the background nucleotide frequencies in the genome being considered

종 합종 합 Lawrence, Liu, Neuwald, Collapsed Gibbs Sampler algorithm in Multiple Sequence Alignment 1993,1994,1995 최근 …Gibbs Sampler 를 응용한 motif search program 들 다수 (Gibbs Motif Sampler, AlignACE,) Dempster, EM algorithm,1977 Tanner & Wong, Data Augmentation 에 의한 사후 확률 계산, 1987, JASA (Gibbs Sampler 이용 )

Gibbs Sampler in Local Multiple Alignment Review by 온 정 헌.

Similar presentations

Presentation on theme: "Gibbs Sampler in Local Multiple Alignment Review by 온 정 헌."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Gibbs Sampler in Local Multiple Alignment Review by 온 정 헌.

Similar presentations

Presentation on theme: "Gibbs Sampler in Local Multiple Alignment Review by 온 정 헌."— Presentation transcript:

Similar presentations

About project

Feedback