Genome evolution: a sequence-centric approach Lecture 3: From Trees to HMMs.

Genome evolution: a sequence-centric approach Lecture 3: From Trees to HMMs

Web: www.wisdom.weizmann.ac.il/~atanay/GenomeEvowww.wisdom.weizmann.ac.il/~atanay/GenomeEvo Access ppts and ex. directly: /home/atanay/public_html/GenomeEvo/ Subscribe to course messages: amos.tanay@weizmann.ac.il

Course outline Probabilistic models Inference Parameter estimation Genome structure Mutations Population Inferring Selection (Probability, Calculus/Matrix theory, some graph theory, some statistics) Simple Tree Models (Continuous time) Markov Chaing

Stochastic Processes and Stationary Distributions Stationary Model Process Model t

Inference on trees and the EM algorithm: summary Inference using dynamic programming (up-down Message passing): Marginal/Posterior probabilities: The EM update rule (conditional probabilities per lineage):

Bayesian Networks Defining the joint probability for a set of random variables given: 1)a directed acyclic graph 2)Conditional probabilities Claim: if G is a tree, we can compute marginals and total probability efficiently Proof: exactly what we did last time.. Claim: For General G, inference is NP hard whiteboard/ exercise Why the up-down will not work? We will discuss methods for approximate inference in detail later, now, lets look for more easy cases Claim: if G have no cycles, whiteboard/ exercise

Markov Models x t the state at time t Transition probabilities are defining the process Add an initial condition to define a distribution on infinite sequences: Problem: we observe finite sequences…and infinite probability spaces are difficult to work with Solution: add an absorbing finish state. Add start state to express probability at time 0.

Hidden Markov Models Observing only emissions of states to some probability space E Each state is equipped with an emission distribution (x a state, e emission) Emission space Caution! This is NOT the HMM Bayes Net 1.Cycles 2.States are NOT random vars!

Hidden Markov Models The HMM can be viewed as a template-model Given a sequence of observations or just its length, we can form a BN Since the BN will be have a tree topology, we know how to compute posteriors Emissions StatesFinishStart

Inference in HMM Forward formula: (like the down alg): Basically, exactly what we saw for trees Backward formula: (like the up alg):

EM for HMMs Can we apply the tree EM verbatim? Emissions States Finish Start Almost, but we have to share parameters: Claim: HMM EM is monotonically improving the likelihood (i.e., sharing parameters is ok)

Hidden states Example: Two Markov models describe our data Switching between models is occurring at random How to model this? No Emission Hidden state

Hidden states What about hidden cycles? Hidden Emitting

Profile HMM for Protein or DNA motifs M I D M I D M I D M I D M I D M I D S F M (Match) states emit a certain amino acid/nucleotide profile I (Insert) states emit some background profile D (Delete) states are hidden Can use EM to train the parameters from a set of examples The use the model for classification or annotation (Both emissions and transition probabilities are informative!) (How do we determine the right size of the model?) (google PFAM, Prosite, “HMM profile protein domain”)

N-order Markov model For evolutionary models, the Markov property makes much sense For spatial (sequence) effects, the Markov property is a (horrible) heuristic N-order relations can be modeled naturally Common error: Forward/Backward in N-order HMM. Dynamic programming would work?

Emissions StatesFinishStart FinishStart 1-HMM Bayes Net: 2-HMM Bayes Net: (You shall explore the inference problem in Ex 2)

Pair-HMM Given two sequences s 1,s 2, an alignment is defined by a set of ‘gaps’ (or indels) in each of the sequences. ACGCGAACCGAATGCCCAA---GGAAAACGTTTGAATTTATA ACCCGT-----ATGCCCAACGGGGAAAACGTTTGAACTTATA indel Standard dynamic programming algorithm compute the best alignment given such distance metric: Standard distance metric: Affine gap cost:Substitution matrix:

Pair-HMM Generalize the HMM concept to probabilistically model alignments. Problem: we are observing two sequences, not a-priori related. What will be emitted from our HMM? M G1G1 G2G2 S F Match states emit and aligned nucleotide pair Gap states emit a nucleotide from one of the sequences only Pr(M->G i ) – “gap open cost”, Pr(G 1 ->G 1 ) – “gap extension cost” Is it a BN template? Forward-backward formula? Whiteboard/ Exercise

Mixture models Whiteboard/ Exercise Inference? EM for Parameter estimation? What about very high dimensions?

Genome evolution: a sequence-centric approach Lecture 3: From Trees to HMMs.

Similar presentations

Presentation on theme: "Genome evolution: a sequence-centric approach Lecture 3: From Trees to HMMs."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Genome evolution: a sequence-centric approach Lecture 3: From Trees to HMMs.

Similar presentations

Presentation on theme: "Genome evolution: a sequence-centric approach Lecture 3: From Trees to HMMs."— Presentation transcript:

Similar presentations

About project

Feedback