From Small-N to Large Harrison B. Prosper SCMA IV, June 20061 Bayesian Methods in Particle Physics: From Small-N to Large Harrison B. Prosper Florida State.

Slides:



Advertisements
Similar presentations
Signal/Background Discrimination Harrison B. Prosper SAMSI, March Signal/Background Discrimination in Particle Physics Harrison B. Prosper Florida.
Advertisements

Brief introduction on Logistic Regression
Practical Statistics for LHC Physicists Bayesian Inference Harrison B. Prosper Florida State University CERN Academic Training Lectures 9 April, 2015.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
1 Vertically Integrated Seismic Analysis Stuart Russell Computer Science Division, UC Berkeley Nimar Arora, Erik Sudderth, Nick Hay.
Summary of Results and Projected Sensitivity The Lonesome Top Quark Aran Garcia-Bellido, University of Washington Single Top Quark Production By observing.
Statistical Methods Chichang Jou Tamkang University.
Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper1 Statistical Tools A Few Comments Harrison B. Prosper Florida State University PHYSTAT Workshop.
Statistics.
Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Multivariate Analysis A Unified Perspective
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Maximum likelihood (ML)
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
Model Inference and Averaging
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:
Why do Wouter (and ATLAS) put asymmetric errors on data points ? What is involved in the CLs exclusion method and what do the colours/lines mean ? ATLAS.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Irakli Chakaberia Final Examination April 28, 2014.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Monte Carlo Methods in Statistical Mechanics Aziz Abdellahi CEDER group Materials Basics Lecture : 08/18/
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
Practical Statistics for Particle Physicists Lecture 3 Harrison B. Prosper Florida State University European School of High-Energy Physics Anjou, France.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
Confidence Intervals First ICFA Instrumentation School/Workshop At Morelia, Mexico, November 18-29, 2002 Harrison B. Prosper Florida State University.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
7. Metropolis Algorithm. Markov Chain and Monte Carlo Markov chain theory describes a particularly simple type of stochastic processes. Given a transition.
Experience from Searches at the Tevatron Harrison B. Prosper Florida State University 18 January, 2011 PHYSTAT 2011 CERN.
Practical Statistics for Particle Physicists Lecture 2 Harrison B. Prosper Florida State University European School of High-Energy Physics Parádfürdő,
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
SEARCH FOR AN INVISIBLE HIGGS IN tth EVENTS T.L.Cheng, G.Kilvington, R.Goncalo Motivation The search for the Higgs boson is a window on physics beyond.
TEMPLATE DESIGN © Approximate Inference Completing the analogy… Inferring Seismic Event Locations We start out with the.
In Bayesian theory, a test statistics can be defined by taking the ratio of the Bayes factors for the two hypotheses: The ratio measures the probability.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Multivariate Methods in Particle Physics Today and Tomorrow Harrison B. Prosper Florida State University 5 November, 2008 ACAT 08, Erice, Sicily.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper1 Statistical Software In DØ The Good, the Bad and the Non-Existent Harrison B. Prosper.
Practical Statistics for LHC Physicists Frequentist Inference Harrison B. Prosper Florida State University CERN Academic Training Lectures 8 April, 2015.
Practical Statistics for Particle Physicists Lecture 3 Harrison B. Prosper Florida State University European School of High-Energy Physics Parádfürdő,
Lecture 1.31 Criteria for optimal reception of radio signals.
Exploring SUSY Parameter Space A New Bayesian Approach
Making inferences from collected data involve two possible tasks:
Probability Theory and Parameter Estimation I
Bayesian Within The Gates A View From Particle Physics
Maximum Likelihood Estimation
Course: Autonomous Machine Learning
ISTEP 2016 Final Project— Project on
Bayesian Models in Machine Learning
Multidimensional Integration Part I
W boson helicity measurement
Computing and Statistical Data Analysis / Stat 7
Parametric Methods Berlin Chen, 2005 References:
Markov Networks.
Presentation transcript:

From Small-N to Large Harrison B. Prosper SCMA IV, June Bayesian Methods in Particle Physics: From Small-N to Large Harrison B. Prosper Florida State University SCMA IV June, 2006

From Small-N to Large Harrison B. Prosper SCMA IV, June Outline  Measuring Zero  Bayesian Fit  Finding Needles in Haystacks  Summary

Measuring Zero

From Small-N to Large Harrison B. Prosper SCMA IV, June Measuring Zero – 1 In the mid-1980s, an experiment at the Institut Laue Langevin (Grenoble, France) searched for evidence of neutron antineutron oscillations, a characteristic prediction of certain Grand Unified Theories.

From Small-N to Large Harrison B. Prosper SCMA IV, June CRISP Experiment Institut Laue Langevin Magnetic shield Neutron gas on Field-on: -> B off Field-off: -> N

From Small-N to Large Harrison B. Prosper SCMA IV, June Measuring Zero – 2 Count number of signal + background events N. Suppress putative signal and count background events B, independently. Results: N = 3 B = 7 Results: N = 3 B = 7

From Small-N to Large Harrison B. Prosper SCMA IV, June Measuring Zero – 3 Classic 2-Parameter Counting Experiment N ~ Poisson(s+b) B ~ Poisson(b) Infer a statement of form: Pr[s < u(N,B)] ≥ 0.9

From Small-N to Large Harrison B. Prosper SCMA IV, June Measuring Zero – 4 In 1984, no exact solution existed in the particle physics literature! Moreover, calculating exact confidence intervals is, according to Kendal and Stuart, “a matter of very considerable difficulty”

From Small-N to Large Harrison B. Prosper SCMA IV, June Measuring Zero – 5 Exact in what way? Over some ensemble of statements of the form 0 < s < u(N,B) at least 90% of them should be true whatever the true values of s and b. Neyman (1937)

From Small-N to Large Harrison B. Prosper SCMA IV, June Measuring Zero - 6 Tried a Bayesian approach: NNN f(s, b|N) = f(N|s, b)  (s, b) / f(N) NN = f(N|s, b)  (b|s)  (s) / f(N) Step 1. Compute the marginal likelihood f(N|s) = ∫f(N|s, b)  (b|s) db Step 2. N f(s|N)= f(N|s)  (s) / ∫f(N|s)  (s) ds

From Small-N to Large Harrison B. Prosper SCMA IV, June But is there a signal? 1. Hypothesis testing(J. Neyman) H 0 : s = 0 H 1 : s > 0 2. p-value(R.A. Fisher) H 0 : s = 0 3. Decision theory(J.M. Bernardo, 1999) Discrepancy “Distance” between models

Bayesian Fit

From Small-N to Large Harrison B. Prosper SCMA IV, June Bayesian Fit Problem: Given counts Data: N = N 1, N 2,..,N M Signal model: A = A 1, A 2,..,A M Background model: B = B 1, B 2,..,B M where M is number of bins (or pixels) find the admixture of A and B that best matches the observations N.

From Small-N to Large Harrison B. Prosper SCMA IV, June Problem (DØ, 2005) Observations = Background + Signal model model (M)

From Small-N to Large Harrison B. Prosper SCMA IV, June Bayesian Fit - Details Assume model of the form Marginalize over a and b

From Small-N to Large Harrison B. Prosper SCMA IV, June Bayesian Fit – Pr(Model) Moreover,… One can compute f(N|p a, p b ) for different signal models M, in particular, for models M that differ by the value of a single parameter. Then compute the probability of model M Pr(M|N) = ∫dp a ∫dp b f(N|p a, p b, M)  (p a,p b |M)  (M) /  (N)

From Small-N to Large Harrison B. Prosper SCMA IV, June Top quark mass hypothesis (GeV) P(M|N) mass= ± 4.5 GeV signal = 33 ± 8 background= 50.8 ± 8.3 Bayesian Fit – Results (DØ, 1997)

Finding Needles in Haystacks

From Small-N to Large Harrison B. Prosper SCMA IV, June single top quark events 0.88 pb 1.98 pb The Needles

From Small-N to Large Harrison B. Prosper SCMA IV, June W boson events The Haystacks 2700 pb 1 : 1000 signal : noise = 1 : 1000

From Small-N to Large Harrison B. Prosper SCMA IV, June The Needles and the Haystacks

From Small-N to Large Harrison B. Prosper SCMA IV, June Finding Needles - 1 The optimal solution is to compute p(S|x) = p(x|S) p(S) / [p(x|S) p(S) + p(x|B) p(B)] Every signal/noise discrimination method is ultimately an algorithm to approximate p(S|x), or a function thereof.

From Small-N to Large Harrison B. Prosper SCMA IV, June Problem: Given D D = x (= x 1,…x N ),y (= y 1,…y N ) of N labeled events. x are the data, y are the labels. Find A function f(x, w), with parameters w, that approximates p(S|x): www p(w|x, y) = p(x, y|w) p(w) / p(x, y) ww = p(y|x, w) p(x|w) p(w) / p(y|x) p(x) w = p(y|x, w) p(w) / p(y|x) assuming p(x|w) = p(x) Finding Needles - 2

From Small-N to Large Harrison B. Prosper SCMA IV, June Likelihood for classification: www p(y|x, w) =  i f(x i, w) y [1 – f(x i, w)] 1-y where y = 0 for background events y = 1 for signal events ww If f(x, w) flexible enough, then maximizing p(y|x, w) with respect to w yields f = p(S|x), asymptotically. Finding Needles - 3

From Small-N to Large Harrison B. Prosper SCMA IV, June However, in a Bayesian calculation it is more natural to average with respect to the posterior density ww f(x|D) = ∫ f(x, w) p(w|D) dw Questions: w 1. Do suitably flexible functions f(x, w) exist? 2. Is there a feasible way to do the integral? Finding Needles - 4

From Small-N to Large Harrison B. Prosper SCMA IV, June Answer 1: Yes! f(x,w) x1x1 x2x2 u, a v, b A neural network is an example of a Kolmogorov function, that is, a function capable of approximating arbitrary mappings f:R n -> R weights The parameters w = (u, a, v, b) are called weights

From Small-N to Large Harrison B. Prosper SCMA IV, June Answer 2: Yes! Computational Method Generate a Markov Chain (MC) of K points {w}, whose stationary density is p(w|D), and average over the stationary part of the chain. Map problem to that of a “particle” moving in a spatially-varying “potential” and use methods of statistical mechanics to generate states (p, w) with probability ~ exp(-H), where H is the “Hamiltonian” H = p 2 + log p(w|D), with “momentum” p.

From Small-N to Large Harrison B. Prosper SCMA IV, June Hybrid Markov Chain Monte Carlo Computational Method… For a fixed H traverse space (p, w) using Hamilton’s equations, which guarantees that all points consistent with H will be visited with equal probability. To allow exploration of states with differing values of H one introduces, periodically, random changes to the momentum p. Software Flexible Bayesian Modeling by Radford Neal

From Small-N to Large Harrison B. Prosper SCMA IV, June Example - Finding SUSY! Transverse momentum spectra Signal: black curve Signal:Noise1:25,000

From Small-N to Large Harrison B. Prosper SCMA IV, June Distribution of f(x|D) beyond 0.9 Assuming L = 10 fb -1 CutSB S/√B 0.991x10 3 2x Signal:Noise 1:20 1:20 Example - Finding SUSY!

From Small-N to Large Harrison B. Prosper SCMA IV, June Summary  Bayesian methods have been at the heart of several important results in particle physics.  However, there is considerable room for expanding their domain of application.  A couple of current issues:  Is there a signal? Is the Bernardo approach useful in particle physics?  Fitting: Is there a practical (Bayesian?) method to test whether or not an N-dimensional function fits an N-dimensional swarm of points?