RooUnfold unfolding framework and algorithms Tim Adye Rutherford Appleton Laboratory Oxford ATLAS Group Meeting 13 th May 2008.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access.
CS479/679 Pattern Recognition Dr. George Bebis
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #22.
RooUnfold unfolding framework and algorithms
JET CHARGE AT LHC Scott Lieberman New Jersey Institute of Technology TUNL REU 2013 Duke High Energy Physics Group Working with: Professor Ayana Arce Dave.
MATH 685/ CSI 700/ OR 682 Lecture Notes
G. Cowan RHUL Physics Profile likelihood for systematic uncertainties page 1 Use of profile likelihood to determine systematic uncertainties ATLAS Top.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
Slide 1 Statistics for HEP Roger Barlow Manchester University Lecture 3: Estimation.
Problem Sets Problem Set 3 –Distributed Tuesday, 3/18. –Due Thursday, 4/3 Problem Set 4 –Distributed Tuesday, 4/1 –Due Tuesday, 4/15. Probably a total.
Statistical Image Modelling and Particle Physics Comments on talk by D.M. Titterington Glen Cowan RHUL Physics PHYSTAT05 Glen Cowan Royal Holloway, University.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
Linear and generalised linear models
G. Cowan Lectures on Statistical Data Analysis Lecture 14 page 1 Statistical Data Analysis: Lecture 14 1Probability, Bayes’ theorem 2Random variables and.
A) Transformation method (for continuous distributions) U(0,1) : uniform distribution f(x) : arbitrary distribution f(x) dx = U(0,1)(u) du When inverse.
Linear and generalised linear models
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
TOPLHCWG. Introduction The ATLAS+CMS combination of single-top production cross-section measurements in the t channel was performed using the BLUE (Best.
Objectives of Multiple Regression
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Unfolding algorithms and tests using RooUnfold
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
Analysis Meeting vol Shun University.
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
R. Kass/W03P416/Lecture 7 1 Lecture 7 Some Advanced Topics using Propagation of Errors and Least Squares Fitting Error on the mean (review from Lecture.
Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”
Unfolding jet multiplicity and leading jet p T spectra in jet production in association with W and Z Bosons Christos Lazaridis University of Wisconsin-Madison.
Modern Navigation Thomas Herring
1 Lesson 8: Basic Monte Carlo integration We begin the 2 nd phase of our course: Study of general mathematics of MC We begin the 2 nd phase of our course:
Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.
Lab 3b: Distribution of the mean
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
RooUnfold unfolding framework and algorithms Tim Adye Rutherford Appleton Laboratory ATLAS RAL Physics Meeting 20 th May 2008.
1 Iterative dynamically stabilized (IDS) method of data unfolding (*) (*arXiv: ) Bogdan MALAESCU CERN PHYSTAT 2011 Workshop on unfolding.
Confirmatory Factor Analysis Part Two STA431: Spring 2013.
Bayes Theorem The most likely value of x derived from this posterior pdf therefore represents our inverse solution. Our knowledge contained in is explicitly.
Unfolding in ALICE Jan Fiete Grosse-Oetringhaus, CERN for the ALICE collaboration PHYSTAT 2011 CERN, January 2011.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
JET CHARGE AT LHC Scott Lieberman New Jersey Institute of Technology TUNL REU 2013 Duke High Energy Physics Group Working with: Professor Ayana Arce Dave.
Tutorial I: Missing Value Analysis
Mitchell Naisbit University of Manchester A study of the decay using the BaBar detector Mitchell Naisbit – Elba.
8th December 2004Tim Adye1 Proposal for a general-purpose unfolding framework in ROOT Tim Adye Rutherford Appleton Laboratory BaBar Statistics Working.
Spectrum Reconstruction of Atmospheric Neutrinos with Unfolding Techniques Juande Zornoza UW Madison.
G. Cowan CERN Academic Training 2012 / Statistics for HEP / Lecture 41 Statistics for HEP Lecture 4: Unfolding Academic Training Lectures CERN, 2–5 April,
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
A different cc/nc oscillation analysis Peter Litchfield  The Idea:  Translate near detector events to the far detector event-by-event, incorporating.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Statistical Data Analysis: Lecture 5 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Richard Kass/F02P416 Lecture 6 1 Lecture 6 Chi Square Distribution (  2 ) and Least Squares Fitting Chi Square Distribution (  2 ) (See Taylor Ch 8,
Computacion Inteligente Least-Square Methods for System Identification.
1 Bunch length measurement with the luminous region Z distribution : evolution since 03/04 B. VIAUD, C. O’Grady B. VIAUD, C. O’Grady Origin of the discrepancies.
Extrapolation Techniques  Four different techniques have been used to extrapolate near detector data to the far detector to predict the neutrino energy.
1 DATA ANALYSIS, ERROR ESTIMATION & TREATMENT Errors (or “uncertainty”) are the inevitable consequence of making measurements. They are divided into three.
Lesson 8: Basic Monte Carlo integration
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Multi-dimensional likelihood
Unfolding Problem: A Machine Learning Approach
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Introduction to Unfolding
Juande Zornoza UW Madison
Unfolding with system identification
Presentation transcript:

RooUnfold unfolding framework and algorithms Tim Adye Rutherford Appleton Laboratory Oxford ATLAS Group Meeting 13 th May 2008

Tim Adye - RALTracking2 Outline 1.What is Unfolding? and why might you want to do it? 2.Overview of a few techniques Regularised unfolding Iterative method 3.Some details Filling the response matrix Choice of regularisation parameter 4.RooUnfold package Currently implements three algorithms with a common interface 5.Status and Plans 6.References

Tim Adye - RALTracking3 Unfolding In other fields known as “deconvolution” or “unsmearing” Given a “true” PDF in μ that is corrupted by detector effects, described by a response function, R, we measure a distribution in ν. For a binned distribution: This may involve 1.inefficiencies: lost events 2.bias and smearing: events moving between bins (off-diagonal R ij ) With infinite statistics, it would be possible to recover the original PDF by inverting the response matrix

Tim Adye - RALTracking4 Not so simple… Unfortunately, if there are statistical fluctuations between bins this information is destroyed Since R washes out statistical fluctuations, R -1 cannot distinguish between wildly fluctuating and smooth PDFs Obtain large negative correlations between adjacent bins Large fluctuations in reconstructed bin contents Need some procedure to remove wildly fluctuating solutions 1.Give added weight to “smoother” solutions 2.Solve for µ iteratively, starting with a reasonable guess and truncate iteration before it gets out of hand 3.Ignore bin-to-bin fluctuations altogether

Tim Adye - RALTracking5 What happens if you don’t regularise

True Gaussian, with Gaussian smearing, systematic translation, and variable inefficiency – trained using a different Gaussian

Tim Adye - RALTracking7 So why don’t we always do this? If the true PDF and response function can be parameterised, then a Maximum Likelihood fit is usually more convenient Directly returns parameters of interest Does not require binning If the response matrix doesn’t include smearing (ie. it’s diagonal), then apply bin-by-bin efficiency correction directly If result is just needed for comparison (eg. with MC), could apply response function to MC simpler than un-applying response to data

Tim Adye - RALTracking8 When to use unfolding Use unfolding to recover theoretical distribution where there is no a-priori parameterisation, and it is needed for the result and not just comparison with MC, and there is significant bin-to-bin migration of events

Tim Adye - RALTracking9 1. Regularised Unfolding Use Maximum Likelihood to fit smeared bin contents to measured data, but include regularisation function where the regularisation parameter, α, controls the degree of smoothness (select α to, eg., minimise mean squared error) Various choices of regularisation function, S, are used Tikhonov regularisation: minimise curvature for some definition of curvature, eg. Implemented as part of RUN by Volker Blobel Maximum entropy: RooUnfHistoSvd by Kerstin Tackmann and Heiko Lacker (BaBar) based on GURU by Andreas Höcker and Vakhtang Kartvelishvili uses Singular Value Decomposition of the response matrix to simplify the regularisation process

Tim Adye - RALTracking10 2. Iterative method Uses Bayes’ theorem to invert and using an initial set of probabilities, p i (eg. MC truth) obtain an improved estimate Repeating with new p i from these new bin contents converges quite rapidly Truncating the iteration prevents us seeing the bad effects of statistical fluctuations Fergus Wilson and I have implemented this method in ROOT/C++ Supports 1D, 2D, and 3D cases

Tim Adye - RALTracking11 Response Matrix The response matrix may be known a-priori, but usually it is determined from Monte Carlo this process is referred to “training” to reduce systematic effects, use a training distribution close to the data For unfolding a 1D distribution, the response matrix can be represented as a 2D histogram filled with MC values for (x measured, x true ) each x true column should be normalised to its reconstruction efficiency an event is either measured with a value x measured, or accounted for in the inefficiency

Double Breit-Wigner, with Gaussian smearing, systematic translation, and variable inefficiency – trained using a single Gaussian

Tim Adye - RALTracking13 Choice of Regularisation Parameter In both types of algorithm, the regularisation parameter determines the relative weight placed on the data, compared to the training MC truth... or between statistical and systematic errors One extreme favours the data, with the risk of statistical fluctuations being seen as true structure has larger statistical errors – but these can be determined in the limit, can be the same as matrix inversion, but numerical effects often appear first The other extreme favours the training sample truth if the MC truth is different from the data (as it surely will be, otherwise why do the experiment!), this will lead to larger systematic errors Of course, one chooses a value somewhere between these extremes This can be optimised and tested with MC samples that are statistically and systematically independent of the training sample Will depend on the number of events and binning This step can usually be performed with toy MC samples

Tim Adye - RALTracking14 RooUnfold Package Make these different methods available as ROOT/C++ classes with a common interface to specify unfolding method and parameters response matrix pass directly or fill from MC sample RooUnfold takes care of normalisation measured histogram return reconstructed truth histogram and errors full covariance matrix also available Simplify handling of multiple dimensions when supported by the underlying algorithm This should make it easy to try and compare different methods in your analysis

2D Unfolding Example 2D Smearing, bias, variable efficiency, and variable rotation

Tim Adye - RALTracking16 RooUnfold Classes RooUnfoldResponse response matrix with various filling and access methods create from MC, use on data (can be stored in a file) RooUnfold – unfolding algorithm base class RooUnfoldBayes – Iterative method RooUnfoldSvd – Inteface to RooUnfHistoSvd package RooUnfoldBinByBin – Simple bin-by-bin method Trivial implementation, but useful to compare with full unfolding RooUnfoldExample – Simple 1D example RooUnfoldTest and RooUnfoldTest2D Test with different training and unfolding distributions

Tim Adye - RALTracking17 Plans and possible improvements 1.Simplify interface: new RooUnfoldDistribution class for more filling/output options consistent handling of multi-dimensional unfolding, with any number of dimensions allow access by histogram (THxD), vector (TVectorD), or matrix (TMatrixD) Other data types, eg. float rather than double? Should be mostly upwardly compatible so users don’t have to change code 2.Add common tools, useful for all algorithms Automatic calculation of figures of merit (eg. Â 2 ) can also use standard ROOT functions on histograms Simplify or automate selection of regularisation parameter 3.More algorithms? Maximum entropy regularisation Simple (if slow) matrix inversion without regularisation perhaps useful with large statistics Investigate techniques used in astrophysics, eg. CLEAN 4.Incorporate as an official ROOT package?

Tim Adye - RALTracking18 RooUnfold Status RooUnfold was originally developed in the BaBar framework. I have subsequently released a stand-alone version This is what I will continue to develop, so it can be used everywhere There seems to be some interest in the HEP community... at least judging by the number of questions from various experiments I have received Unfortunately, I have not had time for much development So far, this has been a “spare time” activity for me I am working with Fergus Wilson, who is interested in trying out some other algorithms

Tim Adye - RALTracking19 References RooUnfold code, documentation, and references to unfolding reviews and techniques can be found on this web page