Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

MCMC estimation in MlwiN
Pattern Recognition and Machine Learning
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Estimation, Variation and Uncertainty Simon French
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Practical Statistics for LHC Physicists Bayesian Inference Harrison B. Prosper Florida State University CERN Academic Training Lectures 9 April, 2015.
Bayesian Estimation in MARK
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Continuous simulation of Beyond-Standard-Model processes with multiple parameters Jiahang Zhong (University of Oxford * ) Shih-Chang Lee (Academia Sinica)
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Presenting: Assaf Tzabari
Machine Learning CMPT 726 Simon Fraser University
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Chi Square Distribution (c2) and Least Squares Fitting
Thanks to Nir Friedman, HU
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Statistical Analysis of Systematic Errors and Small Signals Reinhard Schwienhorst University of Minnesota 10/26/99.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Short Resume of Statistical Terms Fall 2013 By Yaohang Li, Ph.D.
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
Statistical Decision Theory
Bayesian Inference, Basics Professor Wei Zhu 1. Bayes Theorem Bayesian statistics named after Thomas Bayes ( ) -- an English statistician, philosopher.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
Practical Statistics for Particle Physicists Lecture 3 Harrison B. Prosper Florida State University European School of High-Energy Physics Anjou, France.
1 A Bayesian statistical method for particle identification in shower counters IX International Workshop on Advanced Computing and Analysis Techniques.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
1. OPERATONAL! 2 ? 3 ? data’s the best companion! faithful, o, but stubborn. need some kind of natural push to get it going on and on.. 4.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
October 19, 2000ACAT 2000, Fermilab, Suman B. Beri Top Quark Mass Measurements Using Neural Networks Suman B. Beri, Rajwant Kaur Panjab University, India.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Experience from Searches at the Tevatron Harrison B. Prosper Florida State University 18 January, 2011 PHYSTAT 2011 CERN.
Practical Statistics for Particle Physicists Lecture 2 Harrison B. Prosper Florida State University European School of High-Energy Physics Parádfürdő,
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Richard Kass/F02P416 Lecture 6 1 Lecture 6 Chi Square Distribution (  2 ) and Least Squares Fitting Chi Square Distribution (  2 ) (See Taylor Ch 8,
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
From Small-N to Large Harrison B. Prosper SCMA IV, June Bayesian Methods in Particle Physics: From Small-N to Large Harrison B. Prosper Florida State.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Canadian Bioinformatics Workshops
Practical Statistics for Particle Physicists Lecture 3 Harrison B. Prosper Florida State University European School of High-Energy Physics Parádfürdő,
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
Confidence Intervals Lecture 2 First ICFA Instrumentation School/Workshop At Morelia, Mexico, November 18-29, 2002 Harrison B. Prosper Florida State University.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Bayesian Inference: Multiple Parameters
Bayesian Estimation and Confidence Intervals
Review of Probability and Estimators Arun Das, Jason Rebello
More about Posterior Distributions
Multidimensional Integration Part I
Bayesian Inference, Basics
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Parametric Methods Berlin Chen, 2005 References:
Applied Statistics and Probability for Engineers
Presentation transcript:

Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics: from the TeVatron to the LHC October 19, 2007

Harrison B. Prosper Workshop on Top Physics, Grenoble 2 Outline  Introduction  Inference  Model Selection  Summary

Harrison B. Prosper Workshop on Top Physics, Grenoble 3 Introduction Blaise Pascal 1670 Thomas Bayes 1763 Pierre Simon de Laplace 1812

Harrison B. Prosper Workshop on Top Physics, Grenoble 4 Introduction A B AB Let P(A) and P(B) be probabilities, assigned to statements, or events, A and B and let P(AB) be the probability assigned to the joint statement AB, then the conditional probability of A given B is defined by P(A) is the probability of A without the restriction specified by B. P(A|B) is the probability of A when we restrict to the conditions specified by statement B

Harrison B. Prosper Workshop on Top Physics, Grenoble 5 From we deduce immediately Bayes’ Theorem Bayes’ Theorem: Introduction Bayesian statistics is the application of Bayes’ theorem to problems of inference

Harrison B. Prosper Workshop on Top Physics, Grenoble Inference

7 Inference The Bayesian approach to inference is conceptually simple and always the same: Compute DataModel Pr(Data|Model) Compute ModelDataDataModelModelData Pr(Model|Data) = Pr(Data|Model) Pr(Model)/Pr(Data) Model Pr(Model)is called the prior. It is the probability Model assigned to the Model irrespective of theData DataModel Pr(Data|Model)is called the likelihood ModelData Pr(Model|Data)is called the posterior probability

Harrison B. Prosper Workshop on Top Physics, Grenoble 8 posterior density prior density marginalization likelihood In practice, inference is done using the continuous form of Bayes’ theorem:  are the parameters of interest denote all other parameters of the problem, which are referred to as nuisance parameters Inference

Harrison B. Prosper Workshop on Top Physics, Grenoble 9 Model Datum Likelihood s is the mean signal count b is the mean background count Task: Infer s, given N Prior information Example – 1

Harrison B. Prosper Workshop on Top Physics, Grenoble 10 Apply Bayes’ theorem:  (s,b) is the prior density for s and b, which encodes our prior knowledge of the signal and background means. The encoding is often difficult and can be controversial. priorlikelihoodposterior Example – 1

Harrison B. Prosper Workshop on Top Physics, Grenoble 11 First factor the prior Define the marginal likelihood and write the posterior density for the signal as Example – 1

Harrison B. Prosper Workshop on Top Physics, Grenoble 12 The Background Prior Density Suppose that the background has been estimated from a Monte Carlo simulation of the background process, yielding B events that pass the cuts. Assume that the probability for the count B is given by P(B| ) = Poisson(B  ), where  is the (unknown) mean count of the Monte Carlo sample. We can infer the value of by applying Bayes’ theorem to the Monte Carlo background experiment Example – 1

Harrison B. Prosper Workshop on Top Physics, Grenoble 13 The Background Prior Density Assuming a flat prior prior  ( ) = constant, we find p( |B) = Gamma (, 1, B+1) (= B exp(– )/B!). Often the mean background count b in the real experiment is related to the mean count in the Monte Carlo experiment linearly, b = k, where k is an accurately known scale factor, for example, the ratio of the data to Monte Carlo integrated luminosities. The background can be estimated as follows Example – 1

Harrison B. Prosper Workshop on Top Physics, Grenoble 14 The Background Prior Density The posterior density p( |B) now serves as the prior density for the background b in the real experiment  (b) = p( |B), where b = k. We can write and Example – 1

Harrison B. Prosper Workshop on Top Physics, Grenoble 15 The calculation of the marginal likelihood yields: Example – 1

Harrison B. Prosper Workshop on Top Physics, Grenoble 16 Data partitioned into K bins and modeled by a sum of N sources of strength p. The numbers A are the source distributions for model M. Each M corresponds to a different top signal + background model Example – 2: Top Mass – Run I prior likelihood model posterior

Harrison B. Prosper Workshop on Top Physics, Grenoble Probability of Model M Top Quark Mass (GeV/c**2) P(M|d) m top = ± 4.5 GeV s = 33 ± 8 events b = 50.8 ± 8.3 events Example – 2: Top Mass – Run I

Harrison B. Prosper Workshop on Top Physics, Grenoble 18 To Bin Or Not To Bin  Binned – Pros  Likelihood can be modeled accurately  Bins with low counts can be handled exactly  Statistical uncertainties handled exactly  Binned – Cons  Information loss can be severe  Suffers from the curse of dimensionality

Harrison B. Prosper Workshop on Top Physics, Grenoble 19 December 8, Binned likelihoods do work! To Bin Or Not To Bin

Harrison B. Prosper Workshop on Top Physics, Grenoble 20 To Bin Or Not To Bin  Un-Binned – Pros  No loss of information (in principle)  Un-Binned – Cons  Can be difficult to model likelihood accurately. Requires fitting (either parametric or KDE)  Error in likelihood grows approximately linearly with the sample size. So at LHC, large sample sizes could become an issue.

Harrison B. Prosper Workshop on Top Physics, Grenoble 21 likelihood Start with the standard binned likelihood over K bins model Un-binned Likelihood Functions

Harrison B. Prosper Workshop on Top Physics, Grenoble 22 Make the bins smaller and smaller the likelihood becomes where K is now the number of events and a(x) and b(x) are the effective luminosity and background densities, respectively, and A and B are their integrals Un-binned Likelihood Functions

Harrison B. Prosper Workshop on Top Physics, Grenoble 23 The un-binned likelihood function is an example of a marked Poisson likelihood. Each event is marked by the discriminating variable x i, which could be multi-dimensional. The various methods for measuring the top cross section and mass differ in the choice of discriminating variables x. Un-binned Likelihood Functions

Harrison B. Prosper Workshop on Top Physics, Grenoble 24 Note: Since the functions a(x) and b(x) have to be modeled, they will depend on sets of modeling parameters  and , respectively. Therefore, in general, the un-binned likelihood function is which must be combined with a prior density to compute the posterior density for the cross section Un-binned Likelihood Functions

Harrison B. Prosper Workshop on Top Physics, Grenoble 25 If we write s(x) = a(x) , and S = A  we can re-write the un-binned likelihood function as Computing the Un-binned Likelihood Function Since a likelihood function is defined only to within a scaling by a parameter-independent quantity, we are free to scale it by, for example, the observed distribution d(x)

Harrison B. Prosper Workshop on Top Physics, Grenoble 26 One way to approximate the ratio [s(x)+ b(x)]/d(x) is with a neural network function trained with an admixture of data, signal and background in the ratio 2:1:1. If the training can be done accurately enough, the network will approximate n(x) = [s(x)+ b(x)]/[ s(x)+b(x)+d(x)] in which case we can then write Computing the Un-binned Likelihood Function

Harrison B. Prosper Workshop on Top Physics, Grenoble Model Selection

Harrison B. Prosper Workshop on Top Physics, Grenoble 28 posteriorpriorevidence Model Selection Model selection can also be addressed using Bayes’ theorem. It requires computing where the evidence for model M is defined by

Harrison B. Prosper Workshop on Top Physics, Grenoble 29 posterior odds prior odds Bayes factor Model Selection The Bayes Factor, B MN, or any one-to-one function thereof, can be used to choose between two competing models M and N, e.g., signal + background versus background only. However, one must be careful to use proper priors.

Harrison B. Prosper Workshop on Top Physics, Grenoble 30 Model 1 Model 2 Model Selection – Example Consider the following two prototypical models The Bayes factor for these models is given by

Harrison B. Prosper Workshop on Top Physics, Grenoble 31 Model Selection – Example Calibration of Bayes Factors Consider the quantity (called the Kullback-Leibler divergence) For the simple Poisson models with known signal and background, it is easy to show that For s << b, we get √k(2||1) ≈ s /√b. That is, roughly speaking, for s << b, √ ln B 12 ≈ s /√b

Harrison B. Prosper Workshop on Top Physics, Grenoble 32 Summary Bayesian statistics is a well-founded and general framework for thinking about and solving analysis problems, including:  Analysis design  Modeling uncertainty  Parameter estimation  Interval estimation (limit setting)  Model selection  Signal/background discrimination etc. It well worth learning how to think this way!