Use of likelihood fits in HEP Some basics of parameter estimation Examples, Good practice, … Several real-world examples of increasing complexity… Parts.

Slides:



Advertisements
Similar presentations
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Advertisements

1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited. Adapted by Peter Au, George Brown College.
Visual Recognition Tutorial
1/15 Sensitivity to  with B  D(KK  )K Decays CP Working Group Meeting - Thursday, 19 th April 2007 Introduction B  DK  Dalitz Analysis Summary.
CP Violation Reach at Very High Luminosity B Factories Abi Soffer Snowmass 2001 Outline: Ambiguities B  DK B  D*     etc. B  D*  a 0   etc. (“designer.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
16 May 2002Paul Dauncey - BaBar1 Measurements of CP asymmetries and branching fractions in B 0   +  ,  K +  ,  K + K  Paul Dauncey Imperial College,
1 D 0 -D 0 Mixing at BaBar Charm 2007 August, 2007 Abe Seiden University of California at Santa Cruz for The BaBar Collaboration.
MINOS Feb Antineutrino running Pedro Ochoa Caltech.
PHYSTAT 05P. Catastini Bias-Free Estimation in Multicomponent Maximum Likelihood Fits with Component-Dependent Templates Pierluigi Catastini I.N.F.N. -
Slide 1 Statistics for HEP Roger Barlow Manchester University Lecture 3: Estimation.
Evaluating Hypotheses
Search for B     with SemiExclusive reconstruction C.Cartaro, G. De Nardo, F. Fabozzi, L. Lista Università & INFN - Sezione di Napoli.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
5. Estimation 5.3 Estimation of the mean K. Desch – Statistical methods of data analysis SS10 Is an efficient estimator for μ ?  depends on the distribution.
Chris Barnes, Imperial CollegeWIN 2005 B mixing at DØ B mixing at DØ WIN 2005 Delphi, Greece Chris Barnes, Imperial College.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Rao-Cramer-Frechet (RCF) bound of minimum variance (w/o proof) Variance of an estimator of single parameter is limited as: is called “efficient” when the.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Maximum likelihood (ML)
Lecture II-2: Probability Review
Bootstrapping applied to t-tests
Business Statistics: Communicating with Numbers
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
880.P20 Winter 2006 Richard Kass 1 Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
Random Sampling, Point Estimation and Maximum Likelihood.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Max Baak1 Impact of Tag-side Interference on Measurement of sin(2  +  ) with Fully Reconstructed B 0  D (*)  Decays Max Baak NIKHEF, Amsterdam For.
Irakli Chakaberia Final Examination April 28, 2014.
Wouter Verkerke, UCSB Limits on the Lifetime Difference  of Neutral B Mesons and CP, T, and CPT Violation in B 0 B 0 mixing Wouter Verkerke (UC Santa.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
1 Lesson 8: Basic Monte Carlo integration We begin the 2 nd phase of our course: Study of general mathematics of MC We begin the 2 nd phase of our course:
CP violation measurements with the ATLAS detector E. Kneringer – University of Innsbruck on behalf of the ATLAS collaboration BEACH2012, Wichita, USA “Determination.
A statistical test for point source searches - Aart Heijboer - AWG - Cern june 2002 A statistical test for point source searches Aart Heijboer contents:
Gavril Giurgiu, Carnegie Mellon, FCP Nashville B s Mixing at CDF Frontiers in Contemporary Physics Nashville, May Gavril Giurgiu – for CDF.
1 2 nd Pre-Lab Quiz 3 rd Pre-Lab Quiz 4 th Pre-Lab Quiz.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
1 8. One Function of Two Random Variables Given two random variables X and Y and a function g(x,y), we form a new random variable Z as Given the joint.
Top mass error predictions with variable JES for projected luminosities Joshua Qualls Centre College Mentor: Michael Wang.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #24.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
CP Violation Studies in B 0  D (*)  in B A B A R and BELLE Dominique Boutigny LAPP-CNRS/IN2P3 HEP2003 Europhysics Conference in Aachen, Germany July.
A bin-free Extended Maximum Likelihood Fit + Feldman-Cousins error analysis Peter Litchfield  A bin free Extended Maximum Likelihood method of fitting.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
CHAPTER 2.3 PROBABILITY DISTRIBUTIONS. 2.3 GAUSSIAN OR NORMAL ERROR DISTRIBUTION  The Gaussian distribution is an approximation to the binomial distribution.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
CHAPTER- 3.2 ERROR ANALYSIS. 3.3 SPECIFIC ERROR FORMULAS  The expressions of Equations (3.13) and (3.14) were derived for the general relationship of.
Kalanand Mishra June 29, Branching Ratio Measurements of Decays D 0  π - π + π 0, D 0  K - K + π 0 Relative to D 0  K - π + π 0 Giampiero Mancinelli,
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Statistical Data Analysis: Lecture 5 1Probability, Bayes’ theorem 2Random variables and.
Kalanand Mishra February 23, Branching Ratio Measurements of Decays D 0  π - π + π 0, D 0  K - K + π 0 Relative to D 0  K - π + π 0 decay Giampiero.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
September 10, 2002M. Fechner1 Energy reconstruction in quasi elastic events unfolding physics and detector effects M. Fechner, Ecole Normale Supérieure.
Charm Mixing and D Dalitz analysis at BESIII SUN Shengsen Institute of High Energy Physics, Beijing (for BESIII Collaboration) 37 th International Conference.
Wouter Verkerke, UCSB Parameter estimation  2 and likelihood — Introduction to estimation — Properties of  2, ML estimators — Measuring and interpreting.
23 Jan 2012 Background shape estimates using sidebands Paul Dauncey G. Davies, D. Futyan, J. Hays, M. Jarvis, M. Kenzie, C. Seez, J. Virdee, N. Wardle.
Lesson 8: Basic Monte Carlo integration
Inclusive Tag-Side Vertex Reconstruction in Partially Reconstructed B decays -A Progress Report - 11/09/2011 TDBC-BRECO.
Computing and Statistical Data Analysis / Stat 7
Learning From Observed Data
Presentation transcript:

Use of likelihood fits in HEP Some basics of parameter estimation Examples, Good practice, … Several real-world examples of increasing complexity… Parts of this talk taken (with permission) from Gerhard Raven NIKHEF and VU Amsterdam

Imperial College, London -- Feb 2nd Parameter Estimation  Given the theoretical distribution parameters p, what can we say about the data  Need a procedure to estimate p from D(x) Common technique – fit! Theory/ModelData Theory/Model Probability Statistical inference

Imperial College, London -- Feb 2nd A well known estimator – the  2 fit  Given a set of points and a function f(x,p) define the  2  Estimate parameters by minimizing the  2 (p) with respect to all parameters p i In practice, look for  Well known: but why does it work? Is it always right? Does it always give the best possible error? Value of p i at minimum is estimate for p i Error on p i is given by 2 variation of +1 pipi 22

Imperial College, London -- Feb 2nd Back to Basics – What is an estimator?  An estimator is a procedure giving a value for a parameter or a property of a distribution as a function of the actual data values, e.g.  A perfect estimator is Consistent: Unbiased – With finite statistics you get the right answer on average Efficient: There are no perfect estimators!  Estimator of the mean  Estimator of the variance This is called the Minimum Variance Bound

Imperial College, London -- Feb 2nd Another Common Estimator: Likelihood  Definition of Likelihood given D(x) and F(x;p) For convenience the negative log of the Likelihood is often used  Parameters are estimated by maximizing the Likelihood, or equivalently minimizing –ln(L) NB: Functions used in likelihoods must be Probability Density Functions:

Imperial College, London -- Feb 2nd p Variance on ML parameter estimates  The estimator for the parameter variance is I.e. variance is estimated from 2 nd derivative of –log(L) at minimum Valid if estimator is efficient and unbiased!  Visual interpretation of variance estimate Taylor expand log(L) around maximum From Rao-Cramer-Frechet inequality b = bias as function of p, inequality becomes equality in limit of efficient estimator ln(L) 0.5

Imperial College, London -- Feb 2nd Properties of Maximum Likelihood estimators  In general, Maximum Likelihood estimators are Consistent (gives right answer for N   ) Mostly unbiased (bias  1/N, may need to worry at small N) Efficient for large N (you get the smallest possible error) Invariant: (a transformation of parameters will NOT change your answer, e.g  MLE efficiency theorem: the MLE will be unbiased and efficient if an unbiased efficient estimator exists Proof not discussed here for brevity Of course this does not guarantee that any MLE is unbiased and efficient for any given problem Use of 2 nd derivative of –log(L) for variance estimate is usually OK

Imperial College, London -- Feb 2nd More about maximum likelihood estimation  It’s not ‘right’ it is just sensible  It does not give you the ‘most likely value of p’ – it gives you the value of p for which this data is most likely  Numeric methods are often needed to find the maximum of ln(L) Especially difficult if there is >1 parameter Standard tool in HEP: MINUIT  Max. Likelihood does not give you a goodness-of-fit measure If assumed F(x;p) is not capable of describing your data for any p, the procedure will not complain The absolute value of L tells you nothing!

Imperial College, London -- Feb 2nd Properties of  2 estimators  Properties of  2 estimator follow from properties of ML estimator  The  2 estimator follows from ML estimator, i.e it is Efficient, consistent, bias 1/N, invariant, But only in the limit that the error  i is truly Gaussian i.e. need n i > 10 if y i follows a Poisson distribution  Bonus: Goodness-of-fit measure –  2  1 per d.o.f Take log, Sum over all points x i The Likelihood function in p for given points x i ( i ) and function f(x i ;p) Probability Density Function in p for single data point y i ± i and function f(x i ;p)

Imperial College, London -- Feb 2nd Estimating and interpreting Goodness-Of-Fit  Fitting determines best set of parameters of a given model to describe data Is ‘best’ good enough?, i.e. Is it an adequate description, or are there significant and incompatible differences?  Most common test: the  2 test If f(x) describes data then  2  N, if  2 >> N something is wrong How to quantify meaning of ‘large  2 ’? ‘Not good enough’

Imperial College, London -- Feb 2nd How to quantify meaning of ‘large  2 ’  Probability distr. for  2 is given by  To make judgement on goodness-of-fit, relevant quantity is integral of above:  What does  2 probability P(  2,N) mean? It is the probability that a function which does genuinely describe the data on N points would give a  2 probability as large or larger than the one you already have. Since it is a probability, it is a number in the range [0-1]

Imperial College, London -- Feb 2nd Goodness-of-fit –  2  Example for  2 probability Suppose you have a function f(x;p) which gives a  2 of 20 for 5 points (histogram bins). Not impossible that f(x;p) describes data correctly, just unlikely How unlikely?  Note: If function has been fitted to the data Then you need to account for the fact that parameters have been adjusted to describe the data  Practical tips To calculate the probability in PAW ‘ call prob(chi2,ndf) ’ To calculate the probability in ROOT ‘ TMath::Prob(chi2,ndf) ’ For large N, sqrt(2  2 ) has a Gaussian distribution with mean sqrt(2N-1) and  =1

Imperial College, London -- Feb 2nd Goodness-of-fit – Alternatives to  2  When sample size is very small, it may be difficult to find sensible binning – Look for binning free test  Kolmogorov Test 1) Take all data values, arrange in increasing order and plot cumulative distribution 2) Overlay cumulative probability distribution GOF measure: ‘d’ large  bad agreement; ‘d’ small – good agreement Practical tip: in ROOT: TH1::KolmogorovTest(TF1&) calculates probability for you 1)2)

Imperial College, London -- Feb 2nd Maximum Likelihood or  2 ?   2 fit is fastest, easiest Works fine at high statistics Gives absolute goodness-of-fit indication Make (incorrect) Gaussian error assumption on low statistics bins Has bias proportional to 1/N Misses information with feature size < bin size  Full Maximum Likelihood estimators most robust No Gaussian assumption made at low statistics No information lost due to binning Gives best error of all methods (especially at low statistics) No intrinsic goodness-of-fit measure, i.e. no way to tell if ‘best’ is actually ‘pretty bad’ Has bias proportional to 1/N Can be computationally expensive for large N  Binned Maximum Likelihood in between Much faster than full Maximum Likihood Correct Poisson treatment of low statistics bins Misses information with feature size < bin size Has bias proportional to 1/N

Imperial College, London -- Feb 2nd Practical estimation – Numeric  2 and -log(L) minimization  For most data analysis problems minimization of  2 or –log(L) cannot be performed analytically Need to rely on numeric/computational methods In >1 dimension generally a difficult problem!  But no need to worry – Software exists to solve this problem for you: Function minimization workhorse in HEP many years: MINUIT MINUIT does function minimization and error analysis It is used in the PAW,ROOT fitting interfaces behind the scenes It produces a lot of useful information, that is sometimes overlooked Will look in a bit more detail into MINUIT output and functionality next

Imperial College, London -- Feb 2nd Numeric  2 /-log(L) minimization – Proper starting values  For all but the most trivial scenarios it is not possible to automatically find reasonable starting values of parameters This may come as a disappointment to some… So you need to supply good starting values for your parameters Supplying good initial uncertainties on your parameters helps too Reason: Too large error will result in MINUIT coarsely scanning a wide region of parameter space. It may accidentally find a far away local minimum Reason: There may exist multiple (local) minima in the likelihood or  2 p -log(L) Local minimum True minimum

Imperial College, London -- Feb 2nd Multi-dimensional fits – Benefit analysis  Fits to multi-dimensional data sets offer opportunities but also introduce several headaches  It depends very much on your particular analysis if fitting a variable is better than cutting on it ProCon  Enhanced in sensitivity because more data and information is used simultaneously  Exploit information in correlations between observables  More difficult to visualize model, model-data agreement  More room for hard-to-find problems  Just a lot more work  No obvious cut, may be worthwile to include in n-D fit Obvious where to cut, probably not worthwile to include in n-D fit 

Imperial College, London -- Feb 2nd Ways to construct a multi-D fit model  Simplest way: take product of N 1-dim models, e.g Assumes x and y are uncorrelated in data. If this assumption is unwarranted you may get a wrong result: Think & Check!  Harder way: explicitly model correlations by writing a 2-D model, eg.:  Hybrid approach: Use conditional probabilities Probability for x, given a value of y Probability for y

Imperial College, London -- Feb 2nd Multi-dimensional fits – visualizing your model  Overlaying a 2-dim PDF with a 2D (lego) data set doesn’t provide much insight  1-D projections usually easier x-y correlations in data and/or model difficult to visualize “You cannot do quantitative analysis with 2D plots” (Chris Tully, Princeton)

Imperial College, London -- Feb 2nd Multi-dimensional fits – visualizing your model  However: plain 1-D projections often don’t do justice to your fit Example: 3-Dimensional dataset with 50K events, 2500 signal events Distributions in x,y and z chosen identical for simplicity  Plain 1-dimensional projections in x,y,z  Fit of 3-dimensional model finds N sig = 2440±64 Difficult to reconcile with enormous backgrounds in plots x yz

Imperial College, London -- Feb 2nd Multi-dimensional fits – visualizing your model  Reason for discrepancy between precise fit result and large background in 1-D projection plot Events in shaded regions of y,z projections can be discarded without loss of signal  Improved projection plot: show only events in x projection that are likely to be signal in (y,z) projection of fit model Zeroth order solution: make box cut in (x,y) Better solution: cut on signal probability according to fit model in (y,z) x yz

Imperial College, London -- Feb 2nd Multi-dimensional fits – visualizing your model  Goal: Projection of model and data on x, with a cut on the signal probability in (y,z)  First task at hand: calculate signal probability according to PDF using only information in (y,z) variables Define 2-dimensional signal and background PDFs in (y,z) by integrating out x variable (and thus discarding any information contained in x dimension) Calculate signal probability P(y,z) for all data points (x,y,z) Choose sensible cut on P(y,z) -log(P SIG (y,z)) Sig-like  events  Bkg-like events

Imperial College, London -- Feb 2nd Plotting regions of a N-dim model – Case study  Next: plot distribution of data, model with cut on P SIG (y,z) Data: Trivial Model: Calculate projection of selected regions with Monte Carlo method 1)Generate a toy Monte Carlo dataset D TOY (x,y,z) from F(x,y,z) 2)Select subset of D TOY with P SIG (y,z)<C 3)Plot N SIG =2440 ± 64 Plain projection (for comparison)Likelihood ratio projection

Imperial College, London -- Feb 2nd Alternative: ‘sPlots’  Again, compute signal probability based on variables y and z  Plot x, weighted with the above signal probability  Overlay signal PDF for x  See for more details on sPlotshttp://arxiv.org/abs/physics/ PRL 93(2004) B0K+-B0K+- B0K-+B0K-+

Imperial College, London -- Feb 2nd Multidimensional fits – Goodness-of-fit determination  Warning: Goodness-of-fit measures for multi-dimensional fits are difficult Standard  2 test does not work very will in N-dim because of natural occurrence of large number of empty bins Simple equivalent of (unbinned) Kolmogorov test in >1-D does not exist  This area is still very much a work in progress Several new ideas proposed but sometimes difficult to calculate, or not universally suitable Some examples Cramer-von Mises (close to Kolmogorov in concept) Anderson-Darling ‘Energy’ tests No magic bullet here Some references to recent progress: PHYSTAT2001, PHYSTAT2003

Imperial College, London -- Feb 2nd Practical fitting – Error propagation between samples  Common situation: you want to fit a small signal in a large sample Problem: small statistics does not constrain shape of your signal very well Result: errors are large  Idea: Constrain shape of your signal from a fit to a control sample Larger/cleaner data or MC sample with similar properties  Needed: a way to propagate the information from the control sample fit (parameter values and errors) to your signal fit Signal Control

Imperial College, London -- Feb 2nd Practical fitting – Error propagation between samples  0 th order solution: Fit control sample first, signal sample second – signal shape parameters fixed from values of control sample fit Signal fit will give correct parameter estimates But error on signal will be underestimated because uncertainties in the determination of the signal shape from the control sample are not included  1 st order solution Repeat fit on signal sample at p  p Observe difference in answer and add this difference in quadrature to error: Problem: Error estimate will be incorrect if there is >1 parameter in the control sample fit and there are correlations between these parameters  Best solution: a simultaneous fit

Imperial College, London -- Feb 2nd Practical fitting – Simultaneous fit technique  given data D sig (x) and model F sig (x;p sig ) and data D ctl (x) and model F ctl (x;p ctl ) construct  2 sig (p sig ) and  2 ctl (p ctl ) and  Minimize  2 (p sig,p ctl )=  2 sig (p sig )+  2 ctl (p ctl ) All parameter errors, correlations automatically propagated D sig (x), F sig (x;p sig ) D ctl (x), F ctl (x;p ctl )

Imperial College, London -- Feb 2nd Practical Estimation – Verifying the validity of your fit  How to validate your fit? – You want to demonstrate that 1) Your fit procedure gives on average the correct answer ‘no bias’ 2) The uncertainty quoted by your fit is an accurate measure for the statistical spread in your measurement ‘correct error’  Validation is important for low statistics fits Correct behavior not obvious a priori due to intrinsic ML bias proportional to 1/N  Basic validation strategy – A simulation study 1) Obtain a large sample of simulated events 2) Divide your simulated events in O( ) samples with the same size as the problem under study 3) Repeat fit procedure for each data-sized simulated sample 4) Compare average value of fitted parameter values with generated value  Demonstrates (absence of) bias 5) Compare spread in fitted parameters values with quoted parameter error  Demonstrates (in)correctness of error

Imperial College, London -- Feb 2nd Fit Validation Study – Practical example  Example fit model in 1-D (B mass) Signal component is Gaussian centered at B mass Background component is ‘Argus’ function (models phase space near kinematic limit)  Fit parameter under study: N sig Results of simulation study: 1000 experiments with N SIG (gen)=100, N BKG (gen)=200 Distribution of N sig (fit) This particular fit looks unbiased… N sig (fit) N sig (generated)

Imperial College, London -- Feb 2nd Fit Validation Study – The pull distribution  What about the validity of the error? Distribution of error from simulated experiments is difficult to interpret… We don’t have equivalent of N sig (generated) for the error  Solution: look at the pull distribution Definition: Properties of pull: Mean is 0 if there is no bias Width is 1 if error is correct In this example: no bias, correct error within statistical precision of study (N sig ) pull(N sig )

Imperial College, London -- Feb 2nd Fit Validation Study – Low statistics example  Special care should be taken when fitting small data samples Also if fitting for small signal component in large sample  Possible causes of trouble  2 estimators may become approximate as Gaussian approximation of Poisson statistics becomes inaccurate ML estimators may no longer be efficient  error estimate from 2 nd derivative may become inaccurate Bias term proportional to 1/N of ML and  2 estimators may no longer be small compared to 1/sqrt(N)  In general, absence of bias, correctness of error can not be assumed. How to proceed? Use unbinned ML fits only – most robust at low statistics Explicitly verify the validity of your fit

Imperial College, London -- Feb 2nd  Low statistics example: Scenario as before but now with 200 bkg events and only 20 signal events (instead of 100)  Results of simulation study  Absence of bias, correct error at low statistics not obvious! Small yields are typically overestimated N BKG (gen)=200 N SIG (gen)=20 Distributions become asymmetric at low statistics N SIG (fit ) (N SIG ) pull(N SIG ) N SIG (gen) Pull mean is 2.3 away from 0  Fit is positively biased! Demonstration of fit bias at low N – pull distributions

Imperial College, London -- Feb 2nd Fit Validation Study – How to obtain simulated events?  Practical issue: usually you need very large amounts of simulated events for a fit validation study Of order 1000x number of events in your fit, easily > events Using data generated through a full GEANT-based detector simulation can be prohibitively expensive  Solution: Use events sampled directly from your fit function Technique named ‘Toy Monte Carlo’ sampling Advantage: Easy to do and very fast Good to determine fit bias due to low statistics, choice of parameterization, boundary issues etc Cannot be used to test assumption that went into model (e.g. absence of certain correlations). Still need full GEANT- based simulation for that.

Imperial College, London -- Feb 2nd Toy MC generation – Accept/reject sampling  How to sample events directly from your fit function?  Simplest: accept/reject sampling 1) Determine maximum of function f max 2) Throw random number x 3) Throw another random number y 4) If y<f(x)/f max keep x, otherwise return to step 2) PRO: Easy, always works CON: It can be inefficient if function is strongly peaked. Finding maximum empirically through random sampling can be lengthy in >2 dimensions x y f max

Imperial College, London -- Feb 2nd Toy MC generation – Inversion method  Fastest: function inversion 1) Given f(x) find inverted function F(x) so that f( F(x) ) = x 2) Throw uniform random number x 3) Return F(x) PRO: Maximally efficient CON: Only works for invertible functions Take –log(x) x -ln(x) Exponential distribution

Imperial College, London -- Feb 2nd Toy MC Generation in a nutshell  Hybrid: Importance sampling 1) Find ‘envelope function’ g(x) that is invertible into G(x) and that fulfills g(x)>=f(x) for all x 2) Generate random number x from G using inversion method 3) Throw random number ‘y’ 4) If y<f(x)/g(x) keep x, otherwise return to step 2 PRO: Faster than plain accept/reject sampling Function does not need to be invertible CON: Must be able to find invertible envelope function G(x) y g(x) f(x)

Imperial College, London -- Feb 2nd A ‘simple’ real-life example: Measurement of B 0 and B + Lifetime at BaBar A ‘simple’ real-life example: Measurement of B 0 and B + Lifetime at BaBar (4s)  = 0.56 Tag B  z ~ 110 m Exclusive reconstructed B  z ~ 65 m -- zz t  z/c K0K0  D0D0 ++ ++ K-K-

Imperial College, London -- Feb 2nd Measurement of B 0 and B + Lifetime at BaBar Measurement of B 0 and B + Lifetime at BaBar 3. Reconstruct inclusively the vertex of the “other” B meson (B TAG ) 4. compute the proper time difference  t 5. Fit the  t spectra 1.Fully reconstruct one B meson (B REC ) 2.Reconstruct the decay vertex (4s)  = 0.56 Tag B  z ~ 110 m Exclusive reconstructed B  z ~ 65 m -- zz t  z/c K0K0  D0D0 ++ ++ K-K-

Imperial College, London -- Feb 2nd Measurement of B 0 and B + Lifetime at BaBar (4s)  = 0.56 Tag B  z ~ 110 m Exclusive reconstructed B  z ~ 65 m -- zz t  z/c K0K0  D0D0 ++ ++ K-K- 1.Fully reconstruct one B meson (B REC ) 2.Reconstruct the decay vertex Neutral B Mesons Charged B Mesons [GeV] 30 fb -1 : B 0  D* +  -  D 0  +  K -  +

Imperial College, London -- Feb 2nd Fully reconstruct one B meson (B REC ) a) Classify signal and background Neutral B Mesons Charged B Mesons Energy substituted mass [MeV/c 2 ] Utilize BaBar, in case of signal, One produces exactly 2 B mesons, so their energy (in the center-of-mass) is half the center-of-mass energy of the collider Measurement of B 0 and B + Lifetime at BaBar

Imperial College, London -- Feb 2nd Signal Propertime PDF e -|  t|/  (4s)  = 0.56 Tag B  z ~ 110 m Exclusive reconstructed B  z ~ 65 m -- zz t  z/c K0K0  D0D0 ++ ++ K-K-

Imperial College, London -- Feb 2nd Including the detector response…  Must take into account the detector response Convolve ‘physics pdf’ with ‘response fcn’ (aka resolution fcn) Example:  Caveat: the real-world response function is somewhat more complicated eg. additional information from the reconstruction of the decay vertices is used… e -|  t|/  Resolution Function + Lifetime = 

Imperial College, London -- Feb 2nd B 0 Bkg  t m ES <5.27 GeV/c 2 How to deal with the background?

Imperial College, London -- Feb 2nd Putting the ingredients together…  t (ps) signal +bkgd

Imperial College, London -- Feb 2nd Measurement of B 0 and B + Lifetime at BaBar    0 =   ps   =   ps   /  0 =    t RF parameterization Common  t response function for B + and B 0 PRL 87 (2001)  t (ps) sig nal +b kg d Strategy: fit mass, fix those parameters then perform  t fit. 19 free parameters in  t fit: 2 lifetimes 5 resolution parameters 12 parameters for empirical bkg description B0B0 B+B+

Imperial College, London -- Feb 2nd Neutral B meson mixing  B mesons can ‘oscillate’ into B mesons – and vice versa Process is describe through 2 nd order weak diagrams like this: Observation of B 0 B 0 mixing in 1987 was the first evidence of a really heavy top quark…

Imperial College, London -- Feb 2nd Measurement of B 0 B 0 mixing 3. Reconstruct Inclusively the vertex of the “other” B meson (B TAG ) 4. Determine the flavor of B TAG to separate Mixed and Unmixed events 5. compute the proper time difference  t 6. Fit the  t spectra of mixed and unmixed events (4s)  = 0.56 Tag B  z ~ 110 m Reco B  z ~ 65 m ++ zz t  z/c K0K0  D-D- -- -- K+K+ 1. Fully reconstruct one B meson in flavor eigenstate (B REC ) 2. Reconstruct the decay vertex

Imperial College, London -- Feb 2nd Determine the flavour of the ‘other’ B b c dd l-l- B0B0 D, D* W- Lepton Tag b d B0B0 W- W+ cs K *0 d Kaon Tag (4s)  = 0.56 Tag B  z ~ 110 m Reco B  z ~ 65 m ++ zz t  z/c K0K0  D-D- -- -- K+K+

Imperial College, London -- Feb 2nd  t distribution of mixed and unmixed events perfect flavor tagging & time resolution realistic mis-tagging & finite time resolution w: the fraction of wrongly tagged events  m d : oscillation frequency

Imperial College, London -- Feb 2nd Normalization and Counting… Counting matters! Likelihood fit (implicitly!) uses the integrated rates unless you explicitly normalize both populations seperately Acceptance matters! unless acceptance for both populations is the same  Can/Must check that shape result consistent with counting

Imperial College, London -- Feb 2nd Mixing Likelihood fit Fit Parameters  m d 1 Mistag fractions for B 0 and B 0 tags8 Signal resolution function(scale factor,bias,fractions)8+8=16 Empirical description of background  t19 B lifetime fixed to the PDG value  B = ps Unbinned maximum likelihood fit to flavor-tagged neutral B sample 44 total free parameters All  t parameters extracted from data

Imperial College, London -- Feb 2nd Complex Fits? No matter how you get the background parameters, you have to know them anyway. Could equally well first fit sideband only, in a separate fit, and propagate the numbers But then you get to propagate the statistical errors (+correlations!) on those numbers PRD 66 (2002) M ES <5.27 M ES >5.27

Imperial College, London -- Feb 2nd Mixing Likelihood Fit Result  m d =0.516±0.016±0.010 ps -1 PRD 66 (2002)

Imperial College, London -- Feb 2nd Measurement of CP violation in B  J/  K S 3. Reconstruct Inclusively the vertex of the “other” B meson (B TAG ) 4. Determine the flavor of B TAG to separate B 0 and B 0 5. compute the proper time difference  t 6. Fit the  t spectra of B 0 and B 0 tagged events (4s)  = 0.56 Tag B  z ~ 110 m Reco B  z ~ 65 m -- zz t  z/c K0K0  KS0KS0 -- ++ 1. Fully reconstruct one B meson in CP eigenstate (B REC ) 2. Reconstruct the decay vertex√ ++

Imperial College, London -- Feb 2nd  t Spectrum of CP events perfect flavor tagging & time resolution Mistag fractions w And resolution function R CP PDF realistic mis-tagging & finite time resolution Mixing PDF determined by the flavor sample

Imperial College, London -- Feb 2nd Most recent sin2  Results: 227 BB events  Simultaneous fit to mixing sample and CP sample  CP sample split in various ways (J/  K S vs. J/  K L, …)  All signal and background properties extracted from data sin2β =  (stat)  (sys)

Imperial College, London -- Feb 2nd CP fit parameters [30/fb, LP 2001] Compared to mixing fit, add 2 parameters: CP asymmetry sin(2  ), prompt background fraction CP events) And removes 1 parameter:  m And include some extra events… Total 45 parameters 20 describe background 1 is specific to the CP sample 8 describe signal mistag rates 16 describe the resolution fcn And then of course sin(2b) Note: back in 2001 there was a split in run1/run2, which is the cause of doubling the resolution parameters (8+3=11 extra parameters!) CP fit is basically the mixing fit, with a few more events (which have a slightly different physics PDF), and 2 more parameters…

Imperial College, London -- Feb 2nd Consistent results when data is split by decay mode and tagging category Χ 2 =11.7/6 d.o.f. Prob (χ 2 )=7% Χ 2 =1.9/5 d.o.f. Prob (χ 2 )=86%

Imperial College, London -- Feb 2nd Commercial Break

Imperial College, London -- Feb 2nd RooFit A general purpose tool kit for data modeling Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine) This talk comes with free software that helps you do many labor intensive analysis and fitting tasks much more easily

Imperial College, London -- Feb 2nd RooFit at SourceForge - roofit.sourceforge.net RooFit available at SourceForge to facilitate access and communication with all users Code access –CVS repository via pserver –File distribution sets for production versions

Imperial College, London -- Feb 2nd RooFit at SourceForge - Documentation Documentation Comprehensive set of tutorials (PPT slide show + example macros) Five separate tutorials More than 250 slides and 20 macros in total Class reference in THtml style

Imperial College, London -- Feb 2nd The End  Some material for further reading R. Barlow, Statistics: A Guide to the Use of Statistical Methods in the Physical Sciences, Wiley, 1989 L. Lyons, Statistics for Nuclear and Particle Physics, Cambridge University Press, G. Cowan, Statistical Data Analysis, Clarendon, Oxford, 1998 (See also his 10 hour post-graduate web course: )