Presentation is loading. Please wait.

# The Estimation Problem How would we select parameters in the limiting case where we had ALL the data? k → l  l’ k→ l’ Intuitively, the actual frequencies.

## Presentation on theme: "The Estimation Problem How would we select parameters in the limiting case where we had ALL the data? k → l  l’ k→ l’ Intuitively, the actual frequencies."— Presentation transcript:

The Estimation Problem How would we select parameters in the limiting case where we had ALL the data? k → l  l’ k→ l’ Intuitively, the actual frequencies of all the transitions would best describe the parameters we seek The probability ( a ) of transitioning from state k to state l Counts of k to l transitions Counts of k to l transitions summed over all possible states l

The Estimation Problem What about when we only have a sample? Consider: We can therefore imagine values for the parameters, and treat the probability of the observed data as a function of  X = “ S --+++” P(X|  ) = P( “ S --+++”|  ) P(X|  ) = a s → - a - → - a - → + a + → + a + → + Before we collected the data, the probability of this sequence is a function of , our set of unknown parameters: However, our data is fixed. We have already collected it. The parameters are also fixed, but unknown.

The Estimation Problem The Likelihood Function Caution! The likelihood function does not define a probability distribution or density and it does not encompass an area of 1 L(  |X) = P( “ S --+++”|  ) L(  |X) = a s → - a - → - a - → + a + → + a + → + When we treat the probability of the observed data as a function of the parameters, we call this the likelihood function A few things to notice: The probability of any particular sample we get is generally going to be pretty low regardless of the true values of  The likelihood here still tells us some valuable information! We know, for instance that a -→+ is not zero, etc.

Maximum Likelihood Estimation Maximum Likelihood Estimation seeks the solution that “best” explains the observed dataset ML = argmax P(X|)  Translation: “select as our maximum likelihood parameters those parameters that resulted in a maximization of the probability of the observation given those parameters”. i.e. we seek to maximize P(X|) over all possible This is sometimes called the maximum likelihood criterion = argmax log P(X|)  Or

Maximum Likelihood Estimation Log likelihood is often very handy as we often would otherwise need to deal with a long product of terms… This often comes about because there are multiple outcomes that need to be considered i=1 k ML = log P(x i |)  i=1 k = log P(x i |) 

The Estimation Problem Writing the log likelihood function with frequencies… Not sure I need or want this slide………. I don’t think I really want to go into introducing the positivity of relative entropy, etc…. nk nk ii nini log P(x i |)

Maximum Likelihood Estimation Sometimes proving some parameter choice maximizes the likelihood function is the “tricky bit” Let’s skip the gory details, and try to motivate this intuitively… In general case, this is often done by finding the zeros of the derivative of the likelihood function, or by some other trick such as forcing the function into some particular form and relying on an inequality to prove it must be maximum

The Estimation Problem Maybe it’s enough to convince ourselves that… P(k→l| All the data ) k → l  l’ k→ l’ will approach….. as the amount of sample data increases to the limit where we finally have all the data…. Let’s see how this plays out with a simple simulation…

Maximum Likelihood Estimation Typical plot of single sample of 10 nucleotides MLE is prone to overfitting the data in the case where the sample is small The underlying distribution this was sampled from was uniform ( p A = 0.25, p C = 0.25, p G = 0.25, p T = 0.25)

Typical plot of 10 samples of 10 nucleotides The underlying distribution this was sampled from was uniform ( p A = 0.25, p C = 0.25, p G = 0.25, p T = 0.25) Maximum Likelihood Estimation

Typical plot of 100 samples of 10 nucleotides Maximum Likelihood Estimation The underlying distribution this was sampled from was uniform ( p A = 0.25, p C = 0.25, p G = 0.25, p T = 0.25)

Typical plot of 1000 samples of 10 nucleotides Maximum Likelihood Estimation The underlying distribution this was sampled from was uniform ( p A = 0.25, p C = 0.25, p G = 0.25, p T = 0.25)

Download ppt "The Estimation Problem How would we select parameters in the limiting case where we had ALL the data? k → l  l’ k→ l’ Intuitively, the actual frequencies."

Similar presentations

Ads by Google