Presentation is loading. Please wait.

Presentation is loading. Please wait.

RooFit A tool kit for data modeling in ROOT

Similar presentations


Presentation on theme: "RooFit A tool kit for data modeling in ROOT"— Presentation transcript:

1 RooFit A tool kit for data modeling in ROOT
Wouter Verkerke (NIKHEF) David Kirkby (UC Irvine) Wouter Verkerke, NIKHEF

2 Focus: coding a probability density function
Focus on one practical aspect of many data analysis in HEP: How do you formulate your p.d.f. in ROOT For ‘simple’ problems (gauss, polynomial), ROOT built-in models well sufficient But if you want to do unbinned ML fits, use non-trivial functions, or work with multidimensional functions you are quickly running into trouble Wouter Verkerke, NIKHEF

3 A recent example BaBar experiment at SLAC: Extract sin(2b) from time dependent CP violation of B decay: e+e-  Y(4s)  BB Reconstruct both Bs, measure decay time difference Physics of interest is in decay time dependent oscillation Many issues arise Standard ROOT function framework clearly insufficient to handle such complicated functions  must develop new framework Normalization of p.d.f. not always trivial to calculate  may need numeric integration techniques Unbinned fit, >2 dimensions, many events  computation performance important  must try optimize code for acceptable performance Simultaneous fit to control samples to account for detector performance Wouter Verkerke, NIKHEF

4 A recent example Initial approach BaBar: write it from scratch (in FORTRAN!) Does its designated job quite well, but took a long time to develop Possible because sin(2b) effort supported by O(50) people. Optimization of ML calculations hand-coded  error prone and not easy to maintain Difficult to transfer knowledge/code from one analysis to another. A better solution: A modeling language in C++ that integrates seamlessly into ROOT Recycle code and knowledge Development of RooFit package Started 5 years ago. Very successful  virtually everybody in BaBar uses it  now in standard ROOT distribution Wouter Verkerke, NIKHEF

5 RooFit – a data modeling language for ROOT
Extension to ROOT – (Almost) no overlap with existing functionality C++ command line interface & macros Data management & histogramming Graphics interface I/O support MINUIT ToyMC data Generation Data/Model Fitting Data Modeling Model Visualization Wouter Verkerke, NIKHEF

6 Data modeling - Desired functionality
Building/Adjusting Models Easy to write basic PDFs ( normalization) Easy to compose complex models (modular design) Reuse of existing functions Flexibility – No arbitrary implementation-related restrictions A n a l y s i s w o r k c y c l e Using Models Fitting : Binned/Unbinned (extended) MLL fits, Chi2 fits Toy MC generation: Generate MC datasets from any model Visualization: Slice/project model & data in any possible way Speed – Should be as fast or faster than hand-coded model Wouter Verkerke, NIKHEF

7 Data modeling – OO representation
Idea: represent math symbols as C++ objects Result: 1 line of code per symbol in a function (the C++ constructor) rather than 1 line of code per function Mathematical concept RooFit class variable RooRealVar function RooAbsReal PDF RooAbsPdf space point RooArgSet integral RooRealIntegral list of space points RooAbsData Wouter Verkerke, NIKHEF

8 Data modeling – Constructing composite objects
Straightforward correlation between mathematical representation of formula and RooFit code Math RooGaussian g RooFit diagram RooRealVar x RooRealVar m RooFormulaVar sqrts RooRealVar s RooFit code  RooRealVar x(“x”,”x”,-10,10) ;  RooRealVar m(“m”,”mean”,0) ;  RooRealVar s(“s”,”sigma”,2,0,10) ;  RooFormulaVar sqrts(“sqrts”,”sqrt(s)”,s) ;  RooGaussian g(“g”,”gauss”,x,m,sqrts) ; Wouter Verkerke, NIKHEF

9 Model building – (Re)using standard components
RooFit provides a collection of compiled standard PDF classes RooBMixDecay Physics inspired ARGUS,Crystal Ball, Breit-Wigner, Voigtian, B/D-Decay,…. RooPolynomial RooHistPdf Non-parametric Histogram, KEYS RooArgusBG RooGaussian Basic Gaussian, Exponential, Polynomial,… Chebychev polynomial Easy to extend the library: each p.d.f. is a separate C++ class Wouter Verkerke, NIKHEF

10 Importance of a good and extendable model library
If ‘new’ p.d.f.s are made available in a easy-to-use form, more people will use them Example 1: non-parametric kernel estimation technique See article by Kyle Cranmer People like it, but are put off by the prospect of thoroughly reading the entire article, coding the concept from scratch, and then using it. In RooFit switching from a histogram-based shape to a KEYS shape is a one-line code change (RooHistPdf  RooKeysPdf) Example 2: Chebychev polynomials Its easy to convince people to try them if all they need to do is a one-line code change (RooPolynomial  RooChebychev) Wouter Verkerke, NIKHEF

11 Model building – (Re)using standard components
Library p.d.f.s can be adjusted on the fly. Just plug in any function expression you like as input variable Works universally, even for classes you write yourself Maximum flexibility of library shapes keeps library small g(x;m,s) m(y;a0,a1) g(x,y;a0,a1,s) RooPolyVar m(“m”,y,RooArgList(a0,a1)) ; RooGaussian g(“g”,”gauss”,x,m,s) ; Wouter Verkerke, NIKHEF

12 Handling of p.d.f normalization
Normalization of (component) p.d.f.s to unity is often a good part of the work of writing a p.d.f. RooFit handles most normalization issues transparently to the user P.d.f can advertise (multiple) analytical expressions for integrals When no analytical expression is provided, RooFit will automatically perform numeric integration to obtain normalization More complicated that it seems: even if gauss(x,m,s) can be integrated analytically over x, gauss(f(x),m,s) cannot. Such use cases are automatically recognized. Multi-dimensional integrals can be combination of numeric and p.d.f-provided analytical partial integrals Variety of numeric integration techniques is interfaced Adaptive trapezoid, Gauss-Kronrod, VEGAS MC… User can override parameters globally or per p.d.f. as necessary Wouter Verkerke, NIKHEF

13 Model building – (Re)using standard components
Most physics models can be composed from ‘basic’ shapes RooBMixDecay RooPolynomial RooHistPdf RooArgusBG RooGaussian + RooAddPdf Wouter Verkerke, NIKHEF

14 Model building – (Re)using standard components
Most physics models can be composed from ‘basic’ shapes RooBMixDecay RooProdPdf h(“h”,”h”, RooArgSet(f,g)) RooPolynomial RooHistPdf RooArgusBG RooProdPdf k(“k”,”k”,g, Conditional(f,x)) RooGaussian * RooProdPdf Wouter Verkerke, NIKHEF

15 Using models - Overview
All RooFit models provide universal and complete fitting and Toy Monte Carlo generating functionality Model complexity only limited by available memory and CPU power Fitting/plotting a 5-D model as easy as using a 1-D model Most operations are one-liners Fitting Generating data = gauss.generate(x,1000) RooAbsPdf gauss.fitTo(data) RooDataSet RooAbsData Wouter Verkerke, NIKHEF

16 Fitting functionality
pdf.fitTo(data) Binned or unbinned ML fit? For RooFit this is the essentially the same! Nice example of C++ abstraction through inheritance Fitting interface takes abstract RooAbsData object Supply unbinned data object (created from TTree)  unbinned fit Supply histogram object (created from THx)  binned fit Binned Unbinned x y z 1 3 5 2 4 6 RooDataSet RooDataHist RooAbsData Wouter Verkerke, NIKHEF

17 Automated vs. hand-coded optimization of p.d.f.
Automatic pre-fit PDF optimization Prior to each fit, the PDF is analyzed for possible optimizations Optimization algorithms: Detection and precalculation of constant terms in any PDF expression Function caching and lazy evaluation Factorization of multi-dimensional problems where ever possible Optimizations are always tailored to the specific use in each fit. Possible because OO structure of p.d.f. allows automated analysis of structure No need for users to hard-code optimizations Keeps your code understandable, maintainable and flexible without sacrificing performance Optimization concepts implemented by RooFit are applied consistently and completely to all PDFs Speedup of factor 3-10 reported in realistic complex fits Fit parallelization on multi-CPU hosts Option for automatic parallelization of fit function on multi-CPU hosts (no explicit or implicit support from user PDFs needed) Wouter Verkerke, NIKHEF

18 MC Event generation pdf.generate(vars,nevt)
Sample “toy Monte Carlo” samples from any PDF Accept/reject sampling method used by default MC generation method automatically optimized PDF can advertise internal generator if more efficient methods exists (e.g. Gaussian) Each generator request will use the most efficient accept-reject / internal generator combination available Operator PDFs (sum,product,…) distribute generation over components whenever possible Generation order for products with conditional PDFs is sorted out automatically Possible because OO structure of p.d.f. allows automated analysis of structure Can also specify template data as input to generator Look more carefully at accept/reject side bar pdf.generate(vars,data) Wouter Verkerke, NIKHEF

19 Using models – Plotting
Model visualization geared towards ‘publication plots’ not interactive browsing  emphasis on 1-dimensional plots Simplest case: plotting a 1-D model over data Modular structure of composite p.d.f.s allows easy access to components for plotting Can show Poisson confidence intervals instead of sqrt(N) errors RooPlot* frame = mes.frame() ; data->plotOn(frame) ; pdf->plotOn(frame) ; pdf->plotOn(frame,Components(“bkg”)) frame->Draw() ; Adaptive spacing of curve points to achieve 1‰ precision Can store plot with data and all curves as single object Wouter Verkerke, NIKHEF

20 Visualizing multi-dimensional models
IMHO: Visualization of multi-dim. models much more complicated than visualization of multi-dim. data Simplest scenario: projection plots Simple example M(m,t) = f[S(m)*T(t)]+(1-f)[B(m)*U(t)] Provide intuitive behavior: projection of p.d.f. automatically follows projection of data: 1-D plot frame remembers ‘hidden’ variables in dataset. Subsequent p.d.f.s in same frame are automatically projected Works identically for non-factorizable p.d.f.s! Necessary integrals are calculated automatically either analytical or numerically mB distribution Dt distribution RooPlot* frame = dt.frame() ; data.plotOn(frame) ; model.plotOn(frame) ; model.plotOn(frame,Components(“bkg”)) ; Wouter Verkerke, NIKHEF

21 Using models – plotting multi-dimensional models
Projection plot usually doesn’t capture full information content of your p.d.f.  equally easy to plot a ‘slice’ mB distribution Dt distribution RooPlot* frame = dt.frame() ; data.plotOn(frame) ; model.plotOn(frame) ; model.plotOn(frame,Components(“bkg”)) ; RooPlot* frame = dt.frame() ; dt.setRange(“sel”,5.27,5.30) ; data.plotOn(frame,CutRange(“sel”)) ; model.plotOn(frame,ProjectionRange(“sel”)); model.plotOn(frame,ProjectionRange(“sel”), Components(“bkg”)) ; More complicated ‘slice’ views such as a likelihood ratio plot only take O(3) lines of extra code Wouter Verkerke, NIKHEF

22 Using models – plotting
Can also use template data to show ‘average’ curve Convenient for visualization of conditional p.d.f.s Example plot t distribution of following model over data, averaging over dt values of data model.plotOn(frame,ProjWData(dtData)) ; model.plotOn(frame,Components("bkg"), ProjWData(dtData),LineStyle(kDashed)) ; Wouter Verkerke, NIKHEF

23 Advanced features – Task automation
Support for routine task automation, e.g. useful in finding confidence intervals Accumulate fit statistics Input model Generate toy MC Fit model Distribution of - parameter values - parameter errors - parameter pulls Repeat N times // Instantiate MC study manager RooMCStudy mgr(inputModel) ; // Generate and fit 100 samples of 1000 events mgr.generateAndFit(100,1000) ; // Plot distribution of sigma parameter mgr.plotParam(sigma)->Draw() Wouter Verkerke, NIKHEF

24 RooFit at SourceForge - roofit.sourceforge.net
RooFit moved to SourceForge to facilitate access and communication with non-BaBar users Code access CVS repository via pserver File distribution sets for production versions Wouter Verkerke, NIKHEF

25 RooFit at SourceForge - Documentation
Five separate tutorials More than 250 slides and 20 macros in total Documentation Comprehensive set of tutorials (PPT slide show + example macros) Also available by end of 2005: Pedagogical User Guide + Reference Guide (~100 pages document) Class reference in THtml style Wouter Verkerke, NIKHEF

26 Selection of BaBar publications in 2004 using RooFit
Improved Measurement of the CKM angle alpha using B0r+r- Measurement of branching fractions and charge asymmetries in B+ decays to hp+, hK+, hr+ and h'p+, and search for B0 decays to hK0 and hw Branching Fraction and CP Asymmetries in B0KSKSKS Measurement of CP Asymmetries in B0fK0 and B0K+K-K0S Measurement of Branching Fractions and Time-Dependent CP-Violating Asymmetries in Bh' K Decays Improved Measurement of Time-Dependent CP Violation in B0 to (ccbar)K0 Decays (‘sin2b’) Measurements of the Branching Fraction and CP-Violating Asymmetries in B0f0(980)KS Decays Measurement of Time-dependent CP-Violating Asymmetries in B0K*g, K*KSp0 Decays Study of the decay B0r+r- and constraints on the CKM angle alpha. Measurement of CP-violating Asymmetries in B0K0Sp0 Decays Measurement of Time-Dependent CP Asymmetries in B0fK0 Search for B+/-  [K-/+ p+/-]D K+/- and upper limit on the bu amplitude in B+/-  DK+/- Limits on the Decay-Rate Difference of Neutral B Mesons and on CP, T, and CPT Violation in B0B0bar Oscillations Wouter Verkerke, NIKHEF

27 Summary RooFit adds a powerful data modeling language to ROOT
Mathematical objects (variables, functions…) represented as C++ objects Complex models are easily composed from library of standard components New fundamental components can be written easily Powerful tools for fitting, Toy MC generation and visualization are easy to use Compact syntax Universal functionality (almost no arbitrary / implementation related restrictions) Automated function optimization analysis and implementation delivers industrial strength performance RooFit makes the road from a ‘simple’ to a really elaborate/complex p.d.f. less steep: RooFit is distributed with ROOT starting with ROOT version v Wouter Verkerke, NIKHEF


Download ppt "RooFit A tool kit for data modeling in ROOT"

Similar presentations


Ads by Google