Download presentation

Presentation is loading. Please wait.

Published byGabriella O'Leary Modified over 4 years ago

1
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 20061 Signal/Background Discrimination in Particle Physics Harrison B. Prosper Florida State University SAMSI 8 March, 2006

2
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 20062 Outline Particle Physics Data Signal/Background Discrimination Summary

3
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 20063 Particle Physics Data proton + anti-proton ->positron (e + ) neutrino ( ) Jet1 Jet2 Jet3 Jet4 This event is described by (at least) 3 + 2 + 3 x 4 = 17 measured quantities.

4
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 20064 Particle Physics Data H 0 Standard Model H 1 Model of the Week 1 10 6

5
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 20065 Signal/Background Discrimination To minimize misclassification probability, compute p(S|x) = p(x|S) p(S) / [p(x|S) p(S) + p(x|B) p(B)] Every signal/background discrimination method is ultimately an algorithm to approximate this function, or a mapping thereof. p(s) / p(b) is the prior signal to background ratio, that is, it is S/B before applying a cut to p(S|x).

6
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 20066 Given D D = x, y x = {x 1,…x N },y = {y 1,…y N } of N training examples (events) Infer A discriminant function f(x, w), with parameters w www p(w|x, y) = p(x, y|w) p(w) / p(x, y) ww = p(y|x, w) p(x|w) p(w) / p(y|x) p(x) w = p(y|x, w) p(w) / p(y|x) assuming p(x|w) -> p(x) Signal/Background Discrimination

7
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 20067 A typical likelihood for classification: www p(y|x, w) = i f(x i, w) y [1 – f(x i, w)] 1-y where y = 0 for background events y = 1 for signal events ww If f(x, w) flexible enough, then maximizing p(y|x, w) with respect to w yields f = p(S|x), asymptotically. Signal/Background Discrimination

8
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 20068 However, in a Bayesian calculation it is more natural to average ww y(x) = f(x, w) p(w|D) dw Questions: w 1. Do suitably flexible functions f(x, w) exist? 2. Is there a feasible way to do the integral? Signal/Background Discrimination

9
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 20069 Answer 1: Yes! Hilberts 13 th problem: Prove a special case of the conjecture: The following is impossible, in general, f(x 1,…,x n ) = F( g 1 (x 1 ),…, g n (x n ) ) In 1957, Kolmogorov proved the contrary: A function f:R n -> R can be represented as follows f(x 1,..,x n ) = i=1 2n+1 Q i ( j=1 n G ij (x j ) ) where G ij are independent of f(.)

10
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 200610 Kolmogorov Functions n(x,w) x1x1 x2x2 u, a v, b A neural network is an example of a Kolmogorov function, that is, a function capable of approximating arbitrary mappings f:R n -> R weights The parameters w = (u, a, v, b) are called weights

11
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 200611 Answer 2: Yes! Computational Method Generate a Markov chain (MC) of N points {w}, whose stationary density is p(w|D), and average over the last M points. Map problem into that of particle moving in a spatially-varying potential and use methods of statistical mechanics to generate states (p, w) with probability ~ exp(- H), where H is the Hamiltonian H = log p(w|D) + p 2, with momentum p.

12
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 200612 Hybrid Markov Chain Monte Carlo Computational Method… For a fixed H traverse space (p, w) using Hamiltons equations, which guarantees that all points consistent with H will be visited with equal probability ~ exp(- H). To allow exploration of states with differing values of H one introduces, periodically, random changes to the momentum p. Software Flexible Bayesian Modeling by Radford Neal http://www.cs.utoronto.ca/~radford/fbm.software.html

13
Example 1

14
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 200614 Example 1: 1-D Signal p+pbar -> t q b Background p+pbar -> W b b NN Model Class (1, 15, 1) MCMC 500 tqb + Wbb events Use last 20 points in a chain of 10,000, x tqb skipping every 20 th Wbb

15
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 200615 Example 1: 1-D x Dots p(S|x) = H S /(H S +H B ) H S, H B, 1-D histograms Curves Individual NNs w k n(x, w k ) Black curve

16
Example 2

17
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 200617 Example 2: 14-D (Finding Susy!) Transverse momentum spectra Signal: black curve Signal/Noise1/25,000

18
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 200618 Example 2: 14-D (Finding Susy!) Missing transverse momentum spectrum (caused by escape of neutrinos and Susy particles) Measured quantities: 4 x (E T,, ) + (E T, ) 14 = 14

19
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 200619 LikelihoodPrior Example 2: 14-D (Finding Susy!) Signal 250 p+pbar -> gluino, gluino (Susy) events Background 250 p+pbar -> top, anti-top events NN Model Class (14, 40, 1)(w є 641-D parameter space!) MCMC Use last 100 networks in a Markov chain of 10,000, skipping every 20.

20
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 200620 Results Network distribution beyond n(x) > 0.9 Assuming L = 10 fb -1 CutSB S/B 0.905x10 3 2x10 6 3.5 0.954x10 3 7x10 5 4.7 0.991x10 3 2x10 4 7.0

21
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 200621 But Does It Really Work? Let d(x) = N p(x|S) + N p(x|B) be the density of the data, containing 2N events, assuming, for simplicity, p(S) = p(B). A properly trained classifier y(x) approximates p(S|x) = p(x|S)/[p(x|S) + p(x|B)] Therefore, if the data (signal + background) are weighted with y(x), we should recover the signal density.

22
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 200622 But Does It Really Work? It seems to!

23
Example 3

24
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 200624 Particle Physics Data, Take 2 Two varieties of jet: 1.Tagged (Jet 1, Jet 4) 2.Untagged (Jet 2, Jet 3) We are often interested in Pr(Tagged|Jet Variables)

25
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 200625 Example 3: Tagging Jets Tagged-jet Untagged-jet collision point pd p(T|x)= p(x|T) p(T) / d(x) p d(x) = p(x|T) p(T) + p(x|U) p(U) x = (P T,, ) d (red curve is d(x)!) pd p(x|T) or d(x)

26
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 200626 Probability Density Estimation Approximate a density by a sum over kernels K(.), one placed at each of the N points x i of the training sample. h is one or more smoothing parameters adjusted to provide the best approximation to the true density p(x). If h is too small, the model will be very spiky; if h is too large, features of the density p(x) will be lost.

27
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 200627 Probability Density Estimation Why does this work? Consider the limit as N -> of In the limit N ->, the true density p(x) will be recovered provided that h -> 0 in such a way that

28
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 200628 Probability Density Estimation As long as the kernel behaves sensibly in the N -> limit any kernel will do. In practice, the most commonly used kernel is the product of 1-D Gaussians, one for each dimension i: One advantage of the PDE approximation is that it contains very few adjustable parameters: basically, the smoothing parameters.

29
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 200629 Example 3: Tagging Jets Tagged-jet Untagged-jet collision point Projections of estimated p(T|x) (black curve) onto the P T, and axes. Blue points: ratio of blue to red histograms (see slide 25)

30
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 200630 Example 3: Tagging Jets Tagged-jet Untagged-jet collision point Projections of data weighted by p(T|x). Recovers tagged x density p(x|T).

31
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 200631 But, How Well Does It Work? Tagged-jet Untagged-jet collision point How well do the n-D model and the n-D data agree? A thought (JL, HBP): 1. Project the model and the data onto the same set of randomly directed rays through the origin. 2. Compute some measure of discrepancy for each pair of projections. 3. Do something sensible with this set of numbers!!

32
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 200632 Tagged-jet Untagged-jet collision point But, How Well Does It Work? Projections of p(T|x) onto 3 randomly chosen rays through the origin.

33
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 200633 Tagged-jet Untagged-jet collision point Projections of weighted tagged + untagged data onto the 3 randomly selected rays. But, How Well Does It Work?

34
Signal/Background Discrimination Harrison B. Prosper SAMSI, March 200634 Summary Multivariate methods have been applied with considerable success in particle physics, especially for classification. However, there is considerable room for improving our understanding of them as well as expanding their domain of application. The main challenge is data/model comparison when each datum is a point in 1…20 dimensions. During the SAMSI workshop we hope to make some progress on the use of projections onto multiple rays. This may be an interesting area for collaboration between physicists and statisticians.

Similar presentations

OK

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and.

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google