880.P20 Winter 2006 Richard Kass 1 Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method.

880.P20 Winter 2006 Richard Kass 1 Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method for estimating parameters from existing data. We drop the dx n since it is just proportionality constant

880.P20 Winter 2006 Richard Kass 2 Maximum Likelihood Method (MLM) Average !

880.P20 Winter 2006 Richard Kass 3 Maximum Likelihood Method (MLM) Average ! Cramer-Rao bound

880.P20 Winter 2006 Richard Kass 4 Errors & Maximum Likelihood Method (MLM) How do we calculate errors (  ’s) using the MLM? Start by looking at the case where we have a gaussian pdf. The likelihood function is: It is easier to work with lnL: If we take two derivatives of lnL with respect to  we get: For the case of a gaussian pdf we get the familiar result: The big news here is that the variance of the parameter of interest is related to the 2 nd derivative of L. Since our example uses a gaussian pdf the result is exact. More important, the result is asymptotically true for ALL pdf’s since for large samples (n  ) all likelihood functions become “gaussian”.

880.P20 Winter 2006 Richard Kass 5 Errors & MLM The previous example was for one variable. We can generalize the result to the case where we determine several parameters from the likelihood function (e.g.  1,  2, …  n ): Here V ij is a matrix, (the “covariance matrix” or “error matrix”) and it is evaluated at the values of (  1,  2, …  n ) that maximize the likelihood function. In practice it is often very difficult or impossible to analytically evaluate the 2 nd derivatives. The procedure most often used to determine the variances in the parameters relies on the property that the likelihood function becomes gaussian (or parabolic) asymptotically. We expand lnL about the ML estimate for the parameters. For the one parameter case we have: Since we are evaluating lnL at the value of  (=  * ) that maximizes L, the term with the 1 st derivative is zero. Using the expression for the variance of  on the previous page and neglecting higher order terms we find: Thus we can determine the  k  limits on the parameters by finding the values where lnL decreases by k 2 /2 from its maximum value. This is what MINUIT does!

880.P20 Winter 2006 Richard Kass 6 Example: Log-Likelihood Errors & MLM -67 -66 -65 -64 -63 -62 0100200300400500600 lnL  -5.613 10 4 4 4 4 4 4 979899100101102103104 lnL  y = m3-(m0-m1)^ 2/(2*m2^ 2) ErrorValue 0.013475100.8m1 0.00889441.01m2 0.034297-56128m3 NA0.055864Chisq NA0.99862R Example: Exponential decay: Log-likelihood function for 10 events lnL max for  =189 1  points: (140, 265) Vs exact: (129, 245) L not gaussian Generate events according to an exponential distribution with   = 100  generate  times from an exponential using:  i =-  0 lnr i Calculate lnL vs  &  find max of lnL and the points where lnL=lnL max -1/2 (“1  points”) Compare errors from “exact” formula and log-likelihood points Log-likelihood function for 10 4 events lnL max for  =100.8 1  points: (99.8, 101.8) Vs exact: (99.8, 101.8 L is fit by a gaussian  G The variance of an exponential pdf with mean lifetime=  is:  2 =  2 /n ten events: 1104.082, 220.056, 27.039, 171.492, 10.217, 11.671, 94.930, 74.246, 12.534, 168.319

880.P20 Winter 2006 Richard Kass 7 Determining the Slope and Intercept with MLM Example: MLM and determining slope and intercept of a line Assume we have a set of measurements: (x 1, y 1   ), (x 2, y 2    … (x n, y n  n  and the points are thought to come from a straight line, y=  +  x, and the measurements come from a gaussian pdf. The likelihood function is: We wish to find the  and  that maximizes the likelihood function L. Thus we need to take some derivatives: We have to solve the two equations for the two unknowns,  and . We can get an exact solution since these equations are linear in  and . Just have to invert a matrix.

880.P20 Winter 2006 Richard Kass 8 Determining the Errors on the Slope and Intercept with MLM Let’s calculate the error (covariance) matrix for  and  : Note: We could also derive the variance of  and  just using propagation of errors on the formulas for  and .

880.P20 Winter 2006 Richard Kass 9 Chi-Square (  2 ) Distribution Chi-square (  2 ) distribution: Assume that our measurements (x i  i ’s) come from a gaussian pdf with mean = . Define a statistic called chi-square: It can be shown that the pdf for  2 is: This is a continuous pdf. It is a function of two variables,  2 and n = number of degrees of freedom. (  = "Gamma Function“)  2 distribution for different degrees of freedom v A few words about the number of degrees of freedom n: n = # data points - # of parameters calculated from the data points Reminder: If you collected N events in an experiment and you histogram your data in n bins before performing the fit, then you have n data points! EXAMPLE: You count cosmic ray events in 15 second intervals and sort the data into 5 bins: number of intervals with 0 cosmic rays2 number of intervals with 1 cosmic rays7 number of intervals with 2 cosmic rays6 number of intervals with 3 cosmic rays3 number of intervals with 4 cosmic rays2 Although there were 36 cosmic rays in your sample you have only 5 data points. EXAMPLE: We have 10 data points with  and  the mean and standard deviation of the data set. If we calculate  and  from the 10 data point then n = 8 If we know  and calculate  OR if we know  and calculate  then n = 9 If we know  and  then n = 10 RULE of THUMB A good fit has  2 /DOF  1 For n  20, P(  2 >y) can be approximated using a gaussian pdf with y=(2  2 ) 1/2 -(2n-1) 1/2 A common approximation (useful for poisson case) “Pearson’s  2 ”: approximately  2 with n-1 DOF

880.P20 Winter 2006 Richard Kass 10 MLM, Chi-Square, and Least Squares Fitting Assume we have n data points of the form (y i,  i ) and we believe a functional relationship exists between the points: y=f(x,a,b…) In addition, assume we know (exactly) the x i that goes with each y i. We wish to determine the parameters a, b,.. A common procedure is to minimize the following  2 with respect to the parameters: If the y i ’s are from a gaussian pdf then minimizing the  2 is equivalent to the MLM. However, often times the y i ’s are NOT from a gaussian pdf. In these instances we call this technique “  2 fitting” or “Least Squares Fitting”. Strictly speaking, we can only use a  2 probability table when y is from a gaussian pdf. However, there are many instances where even for non-gaussian pdf’s the above sum approximates  2 pdf. From a common sense point of view minimizing the above sum makes sense regardless of the underlying pdf.

880.P20 Winter 2006 Richard Kass 11 Least Squares Fitting Example Example: Leo’s 4.8 (P107) The following data from a radioactive source was taken at 15 s intervals. Determine the lifetime (  ) of the source. The pdf that describes radioactivity (or the decay of a charmed particle) is: As written the above pdf is not linear in . We can turn this into a linear problem by taking the natural log of both sides of the pdf. We can now use the methods of linear least squares to find D and then . In doing the LSQ fit what do we use to weight the data points ? The fluctuations in each bin are governed by Poisson statistics:  2 i =N i. However in this problem the fitting variable is lnN so we must use propagation of errors to transform the variances of N into the variances of lnN. Technically the pdf is |dN(t)/(N(0)dt)| =N(t)/(N(0)  ). Leo has a “1” here

880.P20 Winter 2006 Richard Kass 12 Least Squares Fitting-Exponential Example The slope of the line is given by: Thus the lifetime (  ) = -1/D = 110.7 s The error in the lifetime is:  = 110.7 ± 12.3 sec. Caution: Leo has a factor of ½ in his error matrix (V -1 ) ij, Eq 4.72. He minimizes: Using MLM we minimized: Note: fitting without weighting yields:  =96.8 s. Line of “best fit”

880.P20 Winter 2006 Richard Kass 13 Least Squares Fitting-Exponential Example We can calculate the  2 to see how “good” the data fits an exponential decay distribution: For this problem: lnA=4.725  A=112.73 and  = 110.7 sec Poisson approximation The chi sq per dof is 1.96 The chi sq prob. is 4.9 % Mathematica Calculation: Do[{csq=csq+(cnt[i]-a*Exp[-x[i]/tau])^2/(a*Exp[-x[i]/tau])},{i,1,10}] Print["The chi sq per dof is ",csq/8] xvt=1-CDF[ChiSquareDistribution[8],csq]; Print["The chi sq prob. is ",100*xvt,"%"] This is not such a good fit since the probability is only ~4.9%.

880.P20 Winter 2006 Richard Kass 14 Extended MLM Often we want to do a MLM fit to determine the number of a signal & background events. Let’s assume we know the pdfs that describe the signal (p s ) and background (p b ) and the pdfs depend on some measured quantity x (e.g. energy, momentum, cerenkov angle..) We can write the Likelihood for a single event (i) as: L=f s p s (x i )+(1-f s )p b (x i ) with f s the fraction of signal events in the sample, and the number of signal events: N s =f s N The likelihood function to maximize (with respect to f s ) is: Usually, there is no closed form solution for f s There are several drawbacks to this solution: 1) The number of signal and background are 100% correlated. 2) the (poisson) fluctuations in the number of events (N) is not taken into account Another solution which explicitly takes into account 2) is the EXTENDED MLM: Here v=N s +N b so we can re-write the likelihood function as: The N! term drops out when we take derivatives to max L. We maximize L in terms of N s and N b. If N s & N b are poisson then so is their product for fixed N

880.P20 Winter 2006 Richard Kass 15 Extended MLM Example: BF(B  D 0 K * ) Event yields are determined from an unbinned EMLM fit in the region 5.2  m ES  5.3 GeV/c 2 Choose simple PDFs to fit m ES distributions: A=Argus G=Gaussian Perform ML fits simultaneously in 3 regions. In each region fit K , K  0, K3  m ES distributions (k=1,2,3) 9 PDFs in all I) |  E| Sideband: -100  |  E|  -60 MeV & 60  |  E|  200 MeV pdf: A k II) D 0 Sideband: |m D -m D,PDG |  Take into account “doubly peaking” backgrounds (DP) pdf: (N noP A+N DP G) k III) Signal region: |  E|  25MeV pdf: (N q q A+  N DP G+N sig G) k  scales the N DP found in D 0 the sideband fit. ~520 signal events signal region D 0 sideband  E region “fake” D 0 ’s give fake B’s should be no B’s in this region

880.P20 Winter 2006 Richard Kass 1 Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method.

Similar presentations

Presentation on theme: "880.P20 Winter 2006 Richard Kass 1 Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

880.P20 Winter 2006 Richard Kass 1 Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method.

Similar presentations

Presentation on theme: "880.P20 Winter 2006 Richard Kass 1 Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method."— Presentation transcript:

Similar presentations

About project

Feedback