Presentation on theme: ".. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures."— Presentation transcript:
. Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at www.cs.huji.ac.il/~pmai. Changes made by Dan Geiger and Ydo Wexler.www.cs.huji.ac.il/~pmai
3 Example: Binomial Experiment u When tossed, it can land in one of two positions: Head or Tail HeadTail We denote by the (unknown) probability P(H). Estimation task: Given a sequence of toss samples x, x, …, x[M] we want to estimate the probabilities P(H)= and P(T) = 1 -
4 Statistical Parameter Fitting Consider instances x, x, …, x[M] such that l The set of values that x can take is known l Each is sampled from the same distribution l Each sampled independently of the rest i.i.d. samples u The task is to find a vector of parameters that have generated the given data. This vector parameter can be used to predict future data.
5 The Likelihood Function u How good is a particular ? It depends on how likely it is to generate the observed data u The likelihood for the sequence H,T, T, H, H is 00.20.40.60.81 L()L()
6 Sufficient Statistics To compute the likelihood in the thumbtack example we only require N H and N T (the number of heads and the number of tails) N H and N T are sufficient statistics for the binomial distribution
7 Sufficient Statistics u A sufficient statistic is a function of the data that summarizes the relevant information for the likelihood Datasets Statistics Formally, s(D) is a sufficient statistics if for any two datasets D and D’ s(D) = s(D’ ) L D ( ) = L D’ ( )
8 Maximum Likelihood Estimation MLE Principle: Choose parameters that maximize the likelihood function u This is one of the most commonly used estimators in statistics u Intuitively appealing One usually maximizes the log-likelihood function defined as l D ( ) = log e L D ( )
9 Example: MLE in Binomial Data u Applying the MLE principle we get 00.20.40.60.81 L()L() Example: (N H,N T ) = (3,2) MLE estimate is 3/5 = 0.6 (Which coincides with what one would expect)
10 From Binomial to Multinomial For example, suppose X can have the values 1,2,…,K (For example a die has 6 sides) We want to learn the parameters 1, 2. …, K Sufficient statistics: N 1, N 2, …, N K - the number of times each outcome is observed Likelihood function: MLE:
11 Example: Multinomial u Let be a protein sequence We want to learn the parameters q 1, q 2,…, q 20 corresponding to the frequencies of the 20 amino acids N 1, N 2, …, N 20 - the number of times each amino acid is observed in the sequence Likelihood function: MLE:
12 Is MLE all we need? u Suppose that after 10 observations, ML estimates P(H) = 0.7 for the thumbtack l Would you bet on heads for the next toss? Suppose now that after 10 observations, ML estimates P(H) = 0.7 for a coin Would you place the same bet? Solution: The Bayesian approach which incorporates your subjective prior knowledge. E.g., you may know a priori that some amino acids have the same frequencies and some have low frequencies. How would one use this information ?
13 Bayes’ rule Where Bayes’ rule: It hold because:
14 Example: Dishonest Casino u A casino uses 2 kind of dice: 99% are fair 1% is loaded: 6 comes up 50% of the times u We pick a die at random and roll it 3 times u We get 3 consecutive sixes What is the probability the die is loaded?
15 Dishonest Casino (cont.) The solution is based on using Bayes rule and the fact that while P(loaded | 3sixes) is not known, the other three terms in Bayes rule are known, namely: u P(3sixes | loaded)=(0.5) 3 u P(loaded)=0.01 u P(3sixes) = P(3sixes | loaded) P(loaded)+P(3sixes | not loaded) (1-P(loaded))
17 Biological Example: Proteins u Extracellular proteins have a slightly different amino acid composition than intracellular proteins. u From a large enough protein database (SWISS-PROT), we can get the following: p(ext) - the probability that any new sequence is extracellular p(a i |int) - the frequency of amino acid a i for intracellular proteins p(a i |ext) - the frequency of amino acid a i for extracellular proteins p(int) - the probability that any new sequence is intracellular
18 Biological Example: Proteins (cont.) u What is the probability that a given new protein sequence x=x 1 x 2 ….x n is extracellular? u Assuming that every sequence is either extracellular or intracellular (but not both) we can write. u Thus,
19 Biological Example: Proteins (cont.) u Using conditional probability we get, The probabilities p(int), p(ext) are called the prior probabilities. The probability P(ext|x) is called the posterior probability. u By Bayes’ theorem