Presentation on theme: "GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence."— Presentation transcript:
HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence – P(O|M) A C G T A G C T T T.04.10.02.06 Probability of taking this state path given t-probs sequence (emissions) state paths.01.04.03.08.0004.0040.0006.0048 Probability of emitting this sequence from this state path given e-probs Joint Probability
Viterbi Algorithm A C G T A G C T T T.04.10.02.06 sequence states.01.04.03.08.0004.0040.0006.0048 Highest weight path.0004.0040.0006.0048 Joint Probability …
Viterbi Algorithm A C G T A G C T T T.04.10.02.06 sequence states.01.04.03.08.0004.0040.0006.0048 … Store at each node: Likelihood of highest-likelihood path ending here Traceback to previous node in path
Viterbi Algorithm A C G T A G C T T T.04.10.02.06 sequence states.01.04.03.08.0004.0040.0006.0048 …
Pseudocode for Viterbi Algorithm.04.10.02.06.01.04.03.08.0004.0040.0006.0048 Iterate over positions in sequence At each position, fill in likelihood and traceback Compute emission and transition counts: – number of times state k emits letter s – number of times state k transitions to state k' Compute new parameter values
Likelihood function for Viterbi training.04.10.02.06.01.04.03.08.0004.0040.0006.0048 Likelihood function for Viterbi: Different than likelihood function for EM:
Machine learning debugging tips Debugging ML algorithms is much harder than other code. Problem: It is hard to verify the output after each step.
Machine learning debugging tips Strategies: – Work out a simple example by hand. – Implement a simpler, less efficient algorithm and compare. – Compute your objective function (i.e. likelihood) at each iteration. – Stare hard at your code. – Don't use your real data to test.
HW5 Tips Template is not a real solution! Calculate L(M|O) by hand for the first few observations, compare to your results – For each site, what’s the likelihood of each state? states A C G T
HW5 Tips Not necessary to explicitly create graph structure [although you may find it helpful] states
HW5 Tips Template is not a real solution! Calculate L(M|O) by hand for the first few observations, compare to your results – For each site, what’s the likelihood of each state? Make a toy case: – AAAACCCCCCCCCCCCCCC – Easy to calculate L(M|O) by hand Use log space computations.
HW6: Baum-Welch Goal: learn HMM parameters taking into account all paths: Expectation maximization – Forward backward algorithm. – Re-estimate parameter values based on expected counts.
Dynamic Bayesian network: More than one (hidden) random variable per position...
Bayesian network: Arbitrary structure of random variables
Probabilistic model inference algorithms Problem: Given a model, what is the probability that a variable X has value x? Belief propagation HMMs: Forwards-backwards algorithms Problem: Given a model, what is the most likely assignment of variables? Maximum a posteriori (MAP) inference Viterbi algorithm
Probabilistic model learning algorithms Problem: Learn model parameters. EM: – E step: Use inference to get estimate of hidden variable values. – M step: Re-estimate parameter values. HMMs: Baum-Welch algorithm Use belief propagation for inference: "soft EM" Use Viterbi inference: "hard EM"
What type of inference should you use to get an estimate of latent variable values? 1.Belief propagation then choose most likely value for each variable. – Might get impossible set of values 2.Viterbi inference: – Might get unrealistic values.
Junction Trees A junction tree is a subgraph of the clique graph that: 1. Is a tree 2. Contains all the nodes of the clique graph 3. Satisfies the running intersection property. Running intersection property: For each pair U, V of cliques with intersection S, all cliques on the path between U and V contain S.
Step 7: Populate the Cliques Place each potential from the original network in a clique containing all the variables it references For each clique node, form the product of the distributions in it (as in variable elimination).
Step 8.1: Incorporate Evidence For each evidence variable, go to one table that includes that variable. Set to 0 all entries in that table that disagree with the evidence.
Step 8.2: Upward Pass For each leaf in the junction tree, send a message to its parent. The message is the marginal of its table, summing out any variable not in the separator. When a parent receives a message from a child, it multiplies its table by the message table to obtain its new table. When a parent receives messages from all its children, it repeats the process (acts as a leaf). This process continues until the root receives messages from all its children.
Step 8.3: Downward Pass Reverses upward pass, starting at the root. The root sends a message to each of its children. More specifically, the root divides its current table by the message received from the child, marginalizes the resulting table to the separator, and sends the result to the child. Each child multiplies its table by its parent ’ s table and repeats the process (acts as a root) until leaves are reached. Table at each clique is joint marginal of its variables; sum out as needed. We ’ re done!
Why Does This Work? The junction tree algorithm is just a way to do variable elimination in all directions at once, storing intermediate results at each step.
The Link Between Junction Trees and Variable Elimination To eliminate a variable at any step, we combine all remaining tables involving that variable. A node in the junction tree corresponds to the variables in one of the tables created during variable elimination (the other variables required to remove a variable). An arc in the junction tree shows the flow of data in the elimination computation.
Junction Tree Savings Avoids redundancy in repeated variable elimination Need to build junction tree only once ever Need to repeat belief propagation only when new evidence is received
Loopy Belief Propagation Inference is efficient if graph is tree Inference cost is exponential in treewidth (size of largest clique in graph – 1) What if treewidth is too high? Solution: Do belief prop. on original graph May not converge, or converge to bad approx. In practice, often fast and good approximation