Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 5: Learning models using EM

Similar presentations


Presentation on theme: "Lecture 5: Learning models using EM"— Presentation transcript:

1 Lecture 5: Learning models using EM
Intro to Comp Genomics Lecture 5: Learning models using EM

2 Mixtures of Gaussians We have experimental results of some value
We want to describe the behavior of the experimental values: Essentially one behavior? Two behaviors? More? In one dimension it may look very easy: just looking at the distribution will give us a good idea.. We can formulate the model probabilistically as a mixture of normal distributions. As a generative model: to generate data from the model, we first select the sub-model by sampling from the mixture variable. We then generate a value using the selected normal distribution. If the data is multi dimensional, the problem is becoming non trivial.

3 Inference Let’s represent the model as:
What is the inference problem in our model? Inference: computing the posterior probability of a hidden variable given the data and the model parameters. For p0=0.2, p1=0.8, m0=0, m1=1, s0=1,s1=0.2, what is Pr(s=0|0.8) ?

4 Estimation/parameter learning
Generic optimization techniques: Gradient ascent: Find Simulation annealing Genetic algorithms And more.. Given data, how can we estimate the model parameters? Transform it into an optimization problem! Likelihood: a function of the parameters. Defined given the data. Find parameters that maximize the likelihood: the ML problem Can be approached heuristically: using any optimization technique. But it is a non linear problem which may be very difficult

5 The EM algorithm for mixtures
Continue iterating until convergence. The EM theorem: the algorithm will converge and will improve likelihood monotonically But: No Guarantee of finding the optimum Or of finding anything meaningful The initial conditions are critical: Think of starting from m0=0, m1=10, s1,2=1 Solutions: start from “reasonable” solutions Try many starting points -1 1 We start by guessing parameters: We now go over the samples and compute their posteriors (i.e., inference): We use the posteriors to compute new estimates for the expected sufficient statistics of each distribution, and for the mixture coefficients:

6 Hidden Markov Models Emission space Caution! This is NOT
the HMM Bayes Net 1.Cycles 2.States are NOT random vars! Hidden Markov Models Emission space Observing only emissions of states to some probability space E Each state is equipped with an emission distribution (x a state, e emission)

7 Simple example: Mixture with “memory”
We sample a sequence of dependent values At each step, we decide if we continue to sample from the same distribution or switch with probability p B A We can compute the probability directly only given the hidden variables. P(x) is derived by summing over all possible combination of hidden variables. This is another form of the inference problem (why?) There is an exponential number of h assignments, can we still solve the problem efficiently?

8 Inference in HMM Forward formula: Backward formula: Start States
Finish Backward formula: Emissions Start States Finish Emissions

9 Inference in HMM Forward formula: Backward formula: Start States
Finish Backward formula: Emissions Start States Finish Emissions

10 EM for HMMs Emissions States Finish Start The posterior probability for emitting the i’th character from state s? The posterior probability for transition from s’ to s after character i? With multiple sequence, assume independence (accumulate stats) Claim: HMM EM is monotonically improving the likelihood

11 The EM theorem for mixtures simplified
Assume that we know which distribution generated each sample (samples Si generated from distribution i) We want to maximize the model’s likelihood, given this extra information: “multinomial estimator” solve using Lagrange multipliers: Solve separately:

12 The EM theorem for mixtures simplified
Assume that we know which distribution generated each sample (samples Si generated from distribution i) We want to maximize the model’s likelihood, given this extra information: Normal distribution estimator: using observed sufficient statistics (an exponential family) Solve separately: We found the global optimum of the likelihood in the case of full data.

13 The EM theorem for mixtures simplified
Assume now that each sample i is known to be from distribution j with probability Pij. We can write down: Same maximization holds. In the EM algorithm we used: Solve separately: Deriving the EM formula. In this case Q is dependent on the current parameters, so we call it: What is missing? Q is not L!

14 Expectation-Maximization
Dempster Relative entropy>=0 EM maximization

15 KL-divergence Entropy (Shannon) Kullback-leibler divergence
Not a metric!! KL

16 Bayesian learning vs. Maximum likelihood
Maximum likelihood estimator Introducing prior beliefs on the process (Alternatively: think of virtual evidence) Computing posterior probabilities on the parameters No prior beliefs Parameter Space PME Beliefs MAP MLE Parameter Space

17 Your Task Preparations: Preparations:
Get your hand on the ChIP-seq profiles of CTCF and PolII in hg chr17 Cut the data into segments of 50,000 data points Modeling: Use EM to build a probabilistic model for the peak signals and the background Use heuristics for peak finding to initialize the EM Analysis: Test if your model for single peak structure is as good as the model for two peak structures. Compute the distribution of peaks relative to transcription start sites Your Task Preparations: Background on ChIP-seq CTCF and PolII Modeling ChIP-seq, binning

18 Your Task Your Task Modeling S P1 P2 B P3 F P..
Preparations: Get your hand on the ChIP-seq profiles of CTCF and PolII in hg chr17, bin-size = 50bp Cut the data into segments of 50,000 data points Modeling: Use EM to build a probabilistic model for the peak signals and the background. Use heuristics for peak finding to initialize the EM Analysis: Test if your model for single peak structure is as good as the model for two peak structures. Compute the distribution of peaks relative to transcription start sites Your Task Modeling S P1 P2 B P3 F P.. The model use k-states for the peak and one state for the background Use K=40.

19 Your Task Your Task Modeling Implement HMM inference: forward-backward
Preparations: Get your hand on the ChIP-seq profiles of CTCF and PolII in hg chr17, bin-size = 50bp Cut the data into segments of 50,000 data points Modeling: Use EM to build a probabilistic model for the peak signals and the background. Use heuristics for peak finding to initialize the EM Analysis: Test if your model for single peak structure is as good as the model for two peak structures. Compute the distribution of peaks relative to transcription start sites Your Task Your Task Modeling Implement HMM inference: forward-backward - let’s write them together Make sure your total probability equals for the forward and backward algorithm! Implement the EM update rules Run EM from multiple random points and record the likelihoods you derive Implement smarter initialization: take the average values around all probes with value over a threshold. Compute posterior peak probabilities: report all loci with P(Peak)>0.8

20 Your Task Your Task Analysis
Preparations: Get your hand on the ChIP-seq profiles of CTCF and PolII in hg chr17, bin-size = 50bp Cut the data into segments of 50,000 data points Modeling: Use EM to build a probabilistic model for the peak signals and the background. Use heuristics for peak finding to initialize the EM Analysis: Test if your model for single peak structure is as good as the model for two peak structures. Compute the distribution of peaks relative to transcription start sites Your Task Your Task Analysis Compare the two peak structures you get (from CTCF and PolII) Retrain a model together on the two datasets Compute the log-likelihood of the unified model and compare to the sum of likelihood for the two models Optional: test if the difference is significant by: sampling data from the unified model training two models on the synthetic data and compute the likelihood delta as for real data Use a set of known TSS to compute the distribution of peaks relative to genes


Download ppt "Lecture 5: Learning models using EM"

Similar presentations


Ads by Google