Statistical Methods for Data Analysis a RooStats example

Slides:



Advertisements
Similar presentations
Introduction to Hypothesis Testing
Advertisements

Statistical Methods for Data Analysis Multivariate discriminators with TMVA Luca Lista INFN Napoli.
Statistical Methods for Data Analysis Modeling PDF’s with RooFit
Statistical Methods for Data Analysis Random numbers with ROOT and RooFit Luca Lista INFN Napoli.
Statistical Methods for Data Analysis Random number generators Luca Lista INFN Napoli.
Statistical Methods for Data Analysis upper limits examples from real measurements Luca Lista INFN Napoli.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. A PowerPoint Presentation Package to Accompany Applied Statistics.
Prepared by Lloyd R. Jaisingh
Introduction to Monte Carlo Markov chain (MCMC) methods
Keith D. McCroan US EPA National Air and Radiation Environmental Laboratory Radiobioassay and Radiochemical Measurements Conference October 29, 2009.
Chapter 7 Sampling and Sampling Distributions
Chapter 7 Hypothesis Testing
1 What is JavaScript? JavaScript was designed to add interactivity to HTML pages JavaScript is a scripting language A scripting language is a lightweight.
Chapter 4 Inference About Process Quality
Module 16: One-sample t-tests and Confidence Intervals
CHAPTER 15: Tests of Significance: The Basics Lecture PowerPoint Slides The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner.
Experimental Design and Analysis of Variance
Simple Linear Regression Analysis
Multiple Regression and Model Building
January Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 16 Random Variables.
Probabilistic models Haixu Tang School of Informatics.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Practical Statistics for LHC Physicists Bayesian Inference Harrison B. Prosper Florida State University CERN Academic Training Lectures 9 April, 2015.
Markov-Chain Monte Carlo
Setting Limits in the Presence of Nuisance Parameters Wolfgang A Rolke Angel M López Jan Conrad, CERN.
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
Multiple regression analysis
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
Generalized Linear Models
Statistical Analysis of Systematic Errors and Small Signals Reinhard Schwienhorst University of Minnesota 10/26/99.
Statistical aspects of Higgs analyses W. Verkerke (NIKHEF)
Discovery Experience: CMS Giovanni Petrucciani (UCSD)
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
Statistical methods in LHC data analysis part II.2 Luca Lista INFN Napoli.
G.Corti, P.Robbe LHCb Software Week - 19 June 2009 FSR in Gauss: Generator’s statistics - What type of object is going in the FSR ? - How are the objects.
Statistical Methods for Data Analysis Introduction to the course Luca Lista INFN Napoli.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
ROOT and statistics tutorial Exercise: Discover the Higgs, part 2 Attilio Andreazza Università di Milano and INFN Caterina Doglioni Université de Genève.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #25.
Sampling and estimation Petter Mostad
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #24.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
Study of pair-produced doubly charged Higgs bosons with a four muon final state at the CMS detector (CMS NOTE 2006/081, Authors : T.Rommerskirchen and.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
2005 Unbinned Point Source Analysis Update Jim Braun IceCube Fall 2006 Collaboration Meeting.
In Bayesian theory, a test statistics can be defined by taking the ratio of the Bayes factors for the two hypotheses: The ratio measures the probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
1 Comparing Unbinned likelihood methods IceCube/Antares Common Point source analysis from IC22 & Antares 2007/2008 data sets J. Brunner.
Getting started – ROOT setup Start a ROOT 5.34/17 or higher session Load the roofit libraries If you see a message that RooFit v3.60 is loaded you are.
Hands-on exercises *. Getting started – ROOT 5.25/02 setup Start a ROOT 5.25/02 session –On your local laptop installation, or –On lxplus (SLC4) or lx64slc5.
FUNCTIONS (C) KHAERONI, M.SI. OBJECTIVE After this topic, students will be able to understand basic concept of user defined function in C++ to declare.
S. Ferrag, G. Steele University of Glasgow. RooStats and MClimit comparison Exercise to use RooStats by an MClimit-formatted person: – Use two programs.
Max Baak (CERN) 1 Summary of experiences with HistFactory and RooStats Max Baak (CERN) (on behalf of list of people) RooFit / RooStats meeting January.
Hands-on Session RooStats and TMVA Exercises
(Day 3).
Status of the Higgs to tau tau
arXiv:physics/ v3 [physics.data-an]
Multichannel number counting experiments
Statistical Methods used for Higgs Boson Searches
Ex1: Event Generation (Binomial Distribution)
Statistical methods in LHC data analysis introduction
CHAPTER 13 Design and Analysis of Single-Factor Experiments:
CMS RooStats Higgs Combination Package
Grégory Schott Institute for Experimental Nuclear Physics
Generalized Linear Models
Statistical Methods for Data Analysis a RooStats example
Presentation transcript:

Statistical Methods for Data Analysis a RooStats example Luca Lista INFN Napoli

Statistical Methods for Data Analysis RooStats toolkit Concepts: PDF modeling: done via RooFit package Workspace: an area where the PDF and data model can be defined, and saved to disk for later use Interval Calculator: abstract class for computation of confidence intervals: Bayesian (plain, Markov Chain), central Neyman, Feldman-Cousins, … Hypothesis test calculator: abstract class to compute p-values, significance, CLs, … Luca Lista Statistical Methods for Data Analysis

Statistical Methods for Data Analysis Step-by-step example Example presented during the last CMS Data Analysis School (FNAL-Pisa), by Gena Kukartsev Create a ROOT macro, say counting.C, with a void function, say MakeWorkspace() #include directives and other details skipped for sake of simplicity; complete code available on request Create workspace, save to disk void MakeWorkspace( void ){ // create workspace RooWorkspace * pWs = new RooWorkspace("myWS"); // save workspace to file pWs->SaveAs("workspace.root”); return; } Luca Lista Statistical Methods for Data Analysis

Define parameters and PDF model // create workspace RooWorkspace * pWs = new RooWorkspace("myWS"); // observable: number of events pWs->factory( "n[0]" ); // signal yield pWs->factory( "nsig[0,0,100]" ); // NOTE: three parameters are "current value", "low bound", "upper bound” // background yield pWs->factory( "nbkg[10,0,100]" ); // full event yield pWs->factory( "sum::yield(nsig,nbkg)" ); // NOTE: lower-case "sum" create a function. Upper-case "SUM" would create a PDF // Core model: Poisson probability with mean signal+bkg pWs->factory( "Poisson::model_core(n,yield)" ); // NOTE: "model_core" is a name of the PDF object // print out the workspace contents pWs->Print(); // save workspace to file pWs->SaveAs("workspace.root”); Create workspace Luca Lista Statistical Methods for Data Analysis

Statistical Methods for Data Analysis Output from ROOT ******************************************* * * * W E L C O M E to R O O T * * Version 5.32/00 2 December 2011 * * You are welcome to visit our Web site * * http://root.cern.ch * ROOT 5.32/00 (tags/v5-32-00@42375, Dec 02 2011, 12:42:25 on linux) CINT/ROOT C/C++ Interpreter version 5.18.00, July 2, 2010 Type ? for help. Commands must be C++ statements. Enclose multiple statements between { }. Loading rootlogon.C... root [0] .L counting.C+ Info in : creating shared library /home/kukarzev/svn/exost/workdir/cmsdas2012/./counting_C.so RooFit v3.50 -- Developed by Wouter Verkerke and David Kirkby Copyright (C) 2000-2011 NIKHEF, University of California & Stanford University All rights reserved, please read http://roofit.sourceforge.net/license.txt root [1] MakeWorkspace() RooWorkspace(myWS) myWS contents variables --------- (n,nbkg,nsig) p.d.f.s ------- RooPoisson::model_core[ x=n mean=yield ] = 4.53999e-05 functions -------- RooAddition::yield[ nsig + nbkg ] = 10 Luca Lista Statistical Methods for Data Analysis

Statistical Methods for Data Analysis Replace nsig by σ×ε×L // signal yield // pWs->factory( "nsig[0,0,100]" ); // integrated luminosity pWs->factory( "lumi[0]" ); // cross section - parameter of interest pWs->factory( "xsec[0,0,0.1]" ); // selection efficiency * acceptance pWs->factory( "efficiency[0]" ); pWs->factory( "prod::nsig(lumi,xsec,efficiency)" ); Define prior PDF (uniform) // define Bayesian prior PDF for POI pWs->factory( "Uniform::prior(xsec)" ); Luca Lista Statistical Methods for Data Analysis

Systematic uncertainty: lumi Log-normal uncertainty assumed for luminosity uncertainty: L = Lnom αlumi Where αlumi = κβlumi, where βlumi is the new nuisance parameter distributed normally κ = 1.045 equivalent to 4.5% uncertainty on Lnom. // integrated luminosity // pWs->factory( "lumi[0]" ); // integrated luminosity with systematics pWs->factory( "lumi_nom[5000.0, 4000.0, 6000.0]" ); pWs->factory( "lumi_kappa[1.045]" ); pWs->factory( "cexpr::alpha_lumi('pow (lumi_kappa,beta_lumi)',lumi_kappa,beta_lumi[0,-5,5])" ); pWs->factory( "prod::lumi(lumi_nom,alpha_lumi)" ); pWs->factory( "Gaussian::constr_lumi(beta_lumi,glob_lumi[0,-5,5],1)" ); Luca Lista Statistical Methods for Data Analysis

Lumi uncertainty (cont.) Build the PDF model from a “core” model Luminosity also affects background normalization: nbkg = nbkgnom αlumi // Core model: Poisson probability with mean signal+bkg pWs->factory( "Poisson::model_core(n,yield)" ); ... // model with systematics pWs->factory( "PROD::model(model_core,constr_lumi)" ); // background yield // pWs->factory( "nbkg[10,0,100]" ); pWs->factory( "nbkg_nom[10]" ); pWs->factory( "prod::nbkg(nbkg_nom,alpha_lumi)" ); Luca Lista Statistical Methods for Data Analysis

Systematic uncertainty: efficiency Proceed similarly for efficiency (10% uncertainty) // selection efficiency * acceptance // pWs->factory( "efficiency[0]" ); // selection efficiency * acceptance with systematics pWs->factory( "efficiency_nom[0.1, 0.05, 0.15]" ); pWs->factory( "efficiency_kappa[1.10]" ); pWs->factory( "cexpr::alpha_efficiency('pow (efficiency_kappa,beta_efficiency)', efficiency_kappa,beta_efficiency[0,-5,5])" ); pWs->factory( "prod::efficiency(efficiency_nom,alpha_efficiency)" ); pWs->factory( "Gaussian::constr_efficiency (beta_efficiency,glob_efficiency[0,-5,5],1)" ); // model with systematics // pWs->factory( "PROD::model(model_core,constr_lumi)" ); pWs->factory( "PROD::model(model_core,constr_lumi,constr_efficiency)" ); Luca Lista Statistical Methods for Data Analysis

Systematic uncertainty: nbkg Proceed similarly for nbkg (10% uncertainty) // background yield // pWs->factory( "nbkg_nom[10]" ); // background yield with systematics pWs->factory( "nbkg_nom[10.0, 5.0, 15.0]" ); pWs->factory( "nbkg_kappa[1.10]" ); pWs->factory( "cexpr::alpha_nbkg('pow (nbkg_kappa,beta_nbkg)',nbkg_kappa,beta_nbkg[0,-5,5])" ); pWs->factory( "prod::nbkg(nbkg_nom,alpha_lumi,alpha_nbkg)" ); pWs->factory( "Gaussian::constr_nbkg(beta_nbkg,glob_nbkg[0,-5,5],1)" ); // model with systematics // pWs->factory( "PROD::model(model_core,constr_lumi,constr_efficiency)" ); pWs->factory( "PROD::model (model_core,constr_lumi,constr_efficiency,constr_nbkg)" ); Luca Lista Statistical Methods for Data Analysis

Statistical Methods for Data Analysis Define dataset Use RooFit data set as data container // create set of observables (will need it for datasets and ModelConfig later) RooRealVar * pObs = pWs->var("n"); // get the pointer to the observable RooArgSet obs("observables"); obs.add(*pObs); // create the dataset pObs->setVal(11); // this is your observed data: you counted eleven events RooDataSet * data = new RooDataSet("data", "data", obs); data->add( *pObs ); // import dataset into workspace pWs->import(*data); Luca Lista Statistical Methods for Data Analysis

Statistical Methods for Data Analysis Model configuration // create set of global observables (need to be defined as constants!) pWs->var("glob_lumi")->setConstant(true); pWs->var("glob_efficiency")->setConstant(true); pWs->var("glob_nbkg")->setConstant(true); RooArgSet globalObs("global_obs"); globalObs.add( *pWs->var("glob_lumi") ); globalObs.add( *pWs->var("glob_efficiency") ); globalObs.add( *pWs->var("glob_nbkg") ); // create set of parameters of interest (POI) RooArgSet poi("poi"); poi.add( *pWs->var("xsec") ); // create set of nuisance parameters RooArgSet nuis("nuis"); nuis.add( *pWs->var("beta_lumi") ); nuis.add( *pWs->var("beta_efficiency") ); nuis.add( *pWs->var("beta_nbkg") ); // fix all other variables in model: // everything except observables, POI, and nuisance parameters // must be constant pWs->var("lumi_nom")->setConstant(true); pWs->var("efficiency_nom")->setConstant(true); pWs->var("nbkg_nom")->setConstant(true); pWs->var("lumi_kappa")->setConstant(true); pWs->var("efficiency_kappa")->setConstant(true); pWs->var("nbkg_kappa")->setConstant(true); RooArgSet fixed("fixed"); fixed.add( *pWs->var("lumi_nom") ); fixed.add( *pWs->var("efficiency_nom") ); fixed.add( *pWs->var("nbkg_nom") ); fixed.add( *pWs->var("lumi_kappa") ); fixed.add( *pWs->var("efficiency_kappa") ); fixed.add( *pWs->var("nbkg_kappa") ); // create signal+background Model Config RooStats::ModelConfig sbHypo("SbHypo"); sbHypo.SetWorkspace( *pWs ); sbHypo.SetPdf( *pWs->pdf("model") ); sbHypo.SetObservables( obs ); sbHypo.SetGlobalObservables( globalObs ); sbHypo.SetParametersOfInterest( poi ); sbHypo.SetNuisanceParameters( nuis ); // this is optional, for Bayesian analysis sbHypo.SetPriorPdf( *pWs->pdf("prior") ); // import ModelConfig into workspace pWs->import( sbHypo ); Luca Lista Statistical Methods for Data Analysis

Statistical Methods for Data Analysis Parameter snapshot A parameter snapshot consists of saved values of a subset of model parameters, which can be loaded at any time. A useful snapshot corresponds to the values of the POI and nuisance parameters, which correspond to the best fit to the experimental data // set parameter snapshot that corresponds to the best fit to data RooAbsReal * pNll = sbHypo.GetPdf()->createNLL( *data ); // do not profile global observables RooAbsReal * pProfile = pNll->createProfile( globalObs ); // this will do fit and set POI and nuisance pProfile->getVal(); parameters to fitted values RooArgSet * pPoiAndNuisance = new RooArgSet("poiAndNuisance"); pPoiAndNuisance->add(*sbHypo.GetNuisanceParameters()); pPoiAndNuisance->add(*sbHypo.GetParametersOfInterest()); sbHypo.SetSnapshot(*pPoiAndNuisance); delete pProfile; delete pNll; delete pPoiAndNuisance; // import S+B ModelConfig into workspace pWs->import( sbHypo ); Luca Lista Statistical Methods for Data Analysis

Adding more hypotheses models More than one ModelConfig can be added to the workspace for later use // create background-only Model Config from the S+B one RooStats::ModelConfig bHypo = sbHypo; bHypo.SetName("BHypo"); bHypo.SetWorkspace(*pWs); // set parameter snapshot for bHypo, setting xsec=0 // it is useful to understand how this block of code works // but you can also use it as a recipe to make a parameter snapshot pNll = bHypo.GetPdf()->createNLL( *data ); RooArgSet poiAndGlobalObs("poiAndGlobalObs"); poiAndGlobalObs.add( poi ); poiAndGlobalObs.add( globalObs ); // do not profile POI and global observables pProfile = pNll->createProfile( poiAndGlobalObs ); ((RooRealVar *)poi.first())->setVal( 0 ); // set xsec=0 here pProfile->getVal(); // this will do fit and set nuisance parameters to profiled values pPoiAndNuisance = new RooArgSet( "poiAndNuisance" ); pPoiAndNuisance->add( nuis ); pPoiAndNuisance->add( poi ); bHypo.SetSnapshot(*pPoiAndNuisance); delete pProfile; delete pNll; delete pPoiAndNuisance; // import model config into workspace pWs->import( bHypo ); Luca Lista Statistical Methods for Data Analysis

Statistical Methods for Data Analysis Using the a workspace Create a new ROOT macro, say bayesian_num.C, with a void function, say GetBayesianInterval () #include directives and other details again skipped int GetBayesianInterval( std::string filename = "workspace.root”, std::string wsname = "myWS" ){ // open file with workspace for reading TFile * pInFile = new TFile(filename.c_str(), "read"); // load workspace RooWorkspace * pWs = (RooWorkspace *)pInFile->Get(wsname.c_str()); if (!pWs){ std::cout << "workspace " << wsname << " not found" << std::endl; return -1; } // printout workspace content pWs->Print(); // load and print data from workspace RooAbsData * data = pWs->data("data"); data->Print(); // load and print S+B Model Config RooStats::ModelConfig * pSbHypo = (RooStats::ModelConfig *)pWs->obj("SbHypo"); pSbHypo->Print(); return 0; Luca Lista Statistical Methods for Data Analysis

Compute limits (Bayesian) // create RooStats Bayesian calculator and set parameters RooStats::BayesianCalculator bCalc(*data, *pSbHypo); bCalc.SetName("myBC"); bCalc.SetConfidenceLevel(0.95); bCalc.SetLeftSideTailFraction(0.0); // bCalc->SetIntegrationType("ROOFIT"); // estimate credible interval // NOTE: unfortunate notation: the UpperLimit() name refers // to the upper boundary of an interval, // NOT to the upper limit on the parameter of interest // (it just happens to be the same for the one-sided // interval starting at 0) RooStats::SimpleInterval * pSInt = bCalc.GetInterval(); double upper_bound = pSInt->UpperLimit(); double lower_bound = pSInt->LowerLimit(); std::cout << "one-sided 95%.C.L. bayesian " "credible interval for xsec: [" << lower_bound << ", " << upper_bound << "]" << std::endl; // make posterior PDF plot for POI TCanvas c1("posterior"); bCalc.SetScanOfPosterior(100); RooPlot * pPlot = bCalc.GetPosteriorPlot(); pPlot->Draw(); c1.SaveAs("bayesian_num_posterior.pdf"); // clean up a little delete pSInt; Luca Lista Statistical Methods for Data Analysis

Statistical Methods for Data Analysis Bayesian Markov Chain Copy bayesian_num.C to bayesian_mcmc.C and insert the code below // Metropolis-Hastings algorithm needs a proposal function RooStats::SequentialProposal sp(10.0); RooStats::MCMCCalculator mcmc( *data, *pSbHypo ); mcmc.SetConfidenceLevel(0.95); mcmc.SetNumIters(100000); //num. iterations mcmc.SetProposalFunction(sp); //first N steps to be ignored as burn-in mcmc.SetNumBurnInSteps(500); mcmc.SetLeftSideTailFraction(0.0); //binning for plotting only mcmc.SetNumBins(40); // estimate credible interval RooStats::MCMCInterval * pMcmcInt = mcmc.GetInterval(); double upper_bound = pMcmcInt->UpperLimit( *pWs->var("xsec") ); double lower_bound = pMcmcInt->LowerLimit( *pWs->var("xsec") ); std::cout << "one-sided 95%.C.L. bayesian " " credible interval for xsec: [" << lower_bound << ", " << upper_bound << "]" << std::endl; // make posterior PDF plot for POI TCanvas c1("posterior"); RooStats::MCMCIntervalPlot plot(*pMcmcInt); plot.Draw(); c1.SaveAs("bayesian_mcmc_posterior.pdf"); // make scatter plots to visualise the Marov chain TCanvas c2("xsec_vs_beta_lumi"); plot.DrawChainScatter( *pWs->var("xsec"), *pWs->var("beta_lumi")); c2.SaveAs("scatter_mcmc_xsec_vs_beta_lumi.pdf"); TCanvas c3("xsec_vs_beta_efficiency"); *pWs->var("beta_efficiency")); c3.SaveAs("scatter_xsec_vs_beta_efficiency.pdf"); TCanvas c4("xsec_vs_beta_nbkg"); *pWs->var("beta_nbkg")); c4.SaveAs("scatter_xsec_vs_beta_nbkg.pdf"); // clean up a little delete pMcmcInt; Luca Lista Statistical Methods for Data Analysis

Statistical Methods for Data Analysis Bayesian MCMC plots Luca Lista Statistical Methods for Data Analysis