# .. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures.

## Presentation on theme: ".. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures."— Presentation transcript:

.

. Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at www.cs.huji.ac.il/~pmai. Changes made by Dan Geiger and Ydo Wexler.www.cs.huji.ac.il/~pmai

3 Example: Binomial Experiment u When tossed, it can land in one of two positions: Head or Tail HeadTail  We denote by  the (unknown) probability P(H). Estimation task:  Given a sequence of toss samples x[1], x[2], …, x[M] we want to estimate the probabilities P(H)=  and P(T) = 1 - 

4 Statistical Parameter Fitting  Consider instances x[1], x[2], …, x[M] such that l The set of values that x can take is known l Each is sampled from the same distribution l Each sampled independently of the rest i.i.d. samples u The task is to find a vector of parameters  that have generated the given data. This vector parameter  can be used to predict future data.

5 The Likelihood Function u How good is a particular  ? It depends on how likely it is to generate the observed data u The likelihood for the sequence H,T, T, H, H is 00.20.40.60.81  L()L()

6 Sufficient Statistics  To compute the likelihood in the thumbtack example we only require N H and N T (the number of heads and the number of tails)  N H and N T are sufficient statistics for the binomial distribution

7 Sufficient Statistics u A sufficient statistic is a function of the data that summarizes the relevant information for the likelihood Datasets Statistics  Formally, s(D) is a sufficient statistics if for any two datasets D and D’ s(D) = s(D’ )  L D (  ) = L D’ (  )

8 Maximum Likelihood Estimation MLE Principle: Choose parameters that maximize the likelihood function u This is one of the most commonly used estimators in statistics u Intuitively appealing  One usually maximizes the log-likelihood function defined as l D (  ) = log e L D (  )

9 Example: MLE in Binomial Data u Applying the MLE principle we get 00.20.40.60.81 L()L() Example: (N H,N T ) = (3,2) MLE estimate is 3/5 = 0.6 (Which coincides with what one would expect)

10 From Binomial to Multinomial  For example, suppose X can have the values 1,2,…,K (For example a die has 6 sides)  We want to learn the parameters  1,  2. …,  K Sufficient statistics:  N 1, N 2, …, N K - the number of times each outcome is observed Likelihood function: MLE:

11 Example: Multinomial u Let be a protein sequence  We want to learn the parameters q 1, q 2,…, q 20 corresponding to the frequencies of the 20 amino acids  N 1, N 2, …, N 20 - the number of times each amino acid is observed in the sequence Likelihood function: MLE:

12 Is MLE all we need? u Suppose that after 10 observations, ML estimates P(H) = 0.7 for the thumbtack l Would you bet on heads for the next toss? Suppose now that after 10 observations, ML estimates P(H) = 0.7 for a coin Would you place the same bet? Solution: The Bayesian approach which incorporates your subjective prior knowledge. E.g., you may know a priori that some amino acids have the same frequencies and some have low frequencies. How would one use this information ?

13 Bayes’ rule Where Bayes’ rule: It hold because:

14 Example: Dishonest Casino u A casino uses 2 kind of dice: 99% are fair 1% is loaded: 6 comes up 50% of the times u We pick a die at random and roll it 3 times u We get 3 consecutive sixes  What is the probability the die is loaded?

15 Dishonest Casino (cont.) The solution is based on using Bayes rule and the fact that while P(loaded | 3sixes) is not known, the other three terms in Bayes rule are known, namely: u P(3sixes | loaded)=(0.5) 3 u P(loaded)=0.01 u P(3sixes) = P(3sixes | loaded) P(loaded)+P(3sixes | not loaded) (1-P(loaded))

16 Dishonest Casino (cont.)

17 Biological Example: Proteins u Extracellular proteins have a slightly different amino acid composition than intracellular proteins. u From a large enough protein database (SWISS-PROT), we can get the following: p(ext) - the probability that any new sequence is extracellular p(a i |int) - the frequency of amino acid a i for intracellular proteins p(a i |ext) - the frequency of amino acid a i for extracellular proteins p(int) - the probability that any new sequence is intracellular

18 Biological Example: Proteins (cont.) u What is the probability that a given new protein sequence x=x 1 x 2 ….x n is extracellular? u Assuming that every sequence is either extracellular or intracellular (but not both) we can write. u Thus,

19 Biological Example: Proteins (cont.) u Using conditional probability we get, The probabilities p(int), p(ext) are called the prior probabilities. The probability P(ext|x) is called the posterior probability. u By Bayes’ theorem

Download ppt ".. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures."

Similar presentations