Presentation on theme: "Concluding Talk: Physics Gary Feldman Harvard University PHYSTAT 05 University of Oxford 15 September, 2005."— Presentation transcript:
Concluding Talk: Physics Gary Feldman Harvard University PHYSTAT 05 University of Oxford 15 September, 2005
Gary Feldman PHYSTAT 05 15 September 2005 2 Topics l I will restrict my comments to two topics, both of which I am interested in and both of which received some attention at this meeting: l Event classification l Nuisance parameters
Gary Feldman PHYSTAT 05 15 September 2005 3 Event Classification l The problem: Given a measurement of an event X = (x 1,x 2,…x n ), find the function F(X) which returns 1 if the event is signal (s) and 0 if the event is background (b) to optimize a figure of merit, say signal.
Gary Feldman PHYSTAT 05 15 September 2005 4 Theoretical Solution l In principle the solution is straightforward: Use a Monte Carlo simulation to calculate the likelihood ratio L s (X)/L b (X) and derive F(X) from it. By the Neyman-Pearson Theorem, this is the optimum solution. l Unfortunately, this does not work due to the “curse of dimensionality.” In a high-dimension space, even the largest data set is sparse with the distance between neighboring events comparable to the radius of the space.
Gary Feldman PHYSTAT 05 15 September 2005 5 Practical Solutions l Thus, we are forced to substitute cleverness for brute force. l In recent years, physicists have come to learn that computers may be cleverer than they are. l They have turned to machine learning: One gives the computer samples of signal and background events and lets the computer figure out what F(X) is.
Gary Feldman PHYSTAT 05 15 September 2005 6 Artificial Neural Networks l Originally most of this effort was in artificial neural networks (ANN). Although used successfully in many experiments, ANNs tend to be finicky and often require real cleverness from their creators. l At this conference, there was an advance in ANNs reported by Harrison Prosper. The technique is to average over a collection of networks. Each network is constructed by sampling the weight probability density constructed from the training sample.
Gary Feldman PHYSTAT 05 15 September 2005 7 Trees and Rules l In the past couple of years, interest has started to shift to other techniques, such as decision trees, at least partially sparked by Jerry Friedman’s talk at PHYSTAT 03. l A single decision tree has limited power, but its power can be increased by techniques that effectively sum many trees. A cartoon from Roe’s talk
Gary Feldman PHYSTAT 05 15 September 2005 8 Rules and Bagging Trees l Jerry Friedman gave a talk on rules, which effectively combines a series of trees. l Harrison Prosper gave a talk (for Ilya Narsky) on bagging (Bootstrap AGGregatING) trees. In this technique, one builds a collection of trees by selecting a sample of the training data and, optionally, a subset of the variables. Results on significance of B e at BaBar Single decision tree 2.16 Boosted decision trees 2.62 (not optimized) Bagging decision trees 2.99
Gary Feldman PHYSTAT 05 15 September 2005 9 Boosted Decision Trees l Byron Roe gave a talk on the use of boosted trees in MiniBooNE. Misclassified events in one tree are given a higher weight and a new tree is generated. Repeat to generate 1000 trees. The final classifier is a weighted sum of all of the trees. l Comparison to ANN: Also more robust. % of signal retained 52 variables 21 variables
Gary Feldman PHYSTAT 05 15 September 2005 10 Other Talks l There were a couple of other talks on this subject by Puneet Sarda and Alex Gray, which I could not attend.
Gary Feldman PHYSTAT 05 15 September 2005 11 Nuisance Parameters l Nuisance parameters are parameters with unknown true values for which coverage is required in a frequentist analysis. l They may be statistical, such as number of background events in a sideband used for estimating the background under a peak. l They may be systematic, such as the shape of the background under the peak, or the error caused by the uncertainty of the hadronic fragmentation model in the Monte Carlo. l Most experiments have a large number of systematic uncertainties.
Gary Feldman PHYSTAT 05 15 September 2005 12 New Concerns for the LHC l Although the statistical treatment of these uncertainties is probably the analysis question that I have been asked the most, Kyle Cranmer has pointed out that these issues will be even more important at the LHC. l If the statistical error is O(1) and the systematic error is O(0.1), the the systematic error will contribute as its square or O(0.01) and it does not much matter how you treat it. l However, at the LHC, we may have process with 100 background events and 10% systematic errors. Even more critical, we want 5 for a discovery level.
Gary Feldman PHYSTAT 05 15 September 2005 13 Why 5 ? l LHC searches: 500 searches each of which has 100 resolution elements (mass, angle bins, etc.) x 5 x 10 4 chances to find something. One experiment: False positive rate at 5 (5 x 10 4 ) (3 x 10 -7 ) = 0.015. OK. l Two experiments: l Allowable false positive rate: 10. 2 (5 x 10 4 ) (1 x 10 -4 ) = 10 3.7 required. Required other experiment verification: (1 x 10 -3 )(10) = 0.01 3.1 required. l Caveats: Is the significance real? Are there common systematic errors?
Gary Feldman PHYSTAT 05 15 September 2005 14 A Cornucopia of Techniques l At this meeting we have seen a wide series of techniques discussed for constructing confidence intervals in the presence of nuisance parameters. l Everyone has expressed a concern that their methods cover, at least approximately. This appears to be important for LHC physics in light of Cranmer’s concerns.
Gary Feldman PHYSTAT 05 15 September 2005 15 Bayesian with Coverage l Joel Heinrich presented a decision by CDF to do Bayesian analyses with priors that cover. Advantage is Bayesian conditioning with frequentist coverage. Possibly the maximum amount of work for the experimenter. l Example of coverage with a single Poisson with normalization and background nuisance parameters: Flat priors
Gary Feldman PHYSTAT 05 15 September 2005 16 Bayesian with Coverage Example of coverage with flat and 1/ and 1/b priors for a 4-channel Poisson with normalization and background nuisance parameters Flat priors 1/ and 1/b priors
Gary Feldman PHYSTAT 05 15 September 2005 17 Frequentist/Bayesian Hybrid l Fredrik Tegenfeldt presented a likelihood-ratio ordered (LR) Neyman construction after integrating out the nuisance parameters with a flat priors. In a single channel test, there was no undercoverage. l What happens for a multi-channel case? My guess is that the confidence belt will be distorted by the use of flat priors, but that the method will still cover due to the construction. l Cranmer considers a similar technique, as was used for LEP Higgs searches. l Both are call “Cousins-Highland,” although probably neither actually is.
Gary Feldman PHYSTAT 05 15 September 2005 18 Profile Likelihood l 44 years ago, Kendall and Stuart told us how to eliminate nuisance parameters and do a LR construction:
Gary Feldman PHYSTAT 05 15 September 2005 19 One (Minor) Problem l The Kendall-Stuart prescription leads to the problem that for Poisson problems as the nuisance parameter is better and better known, the confidence intervals do not converge to the limit of being perfectly known. The reason is that the introduction of a nuisance par- ameter breaks the discreteness of the Poisson distribution. From Punzi’s talk
Gary Feldman PHYSTAT 05 15 September 2005 20 One More Try l Since this was referred to in a parallel session as “the Feldman problem” and since two plenary speakers made fun of my Fermilab Workshop plots, I will try to explain them again.
Gary Feldman PHYSTAT 05 15 September 2005 21 The Cousins-Highland Problem l This correction also solves what Bob and I refer to as the Cousins-Highland problem (as opposed to method). l Cousins and Highland turned to a Bayesian approach to calculate the effect of a normalization error because the frequentist approach gave an answer with the wrong sign. l We now understand this was due to simply breaking the discreteness of the Poisson distribution. l In one test case, using this correction reproduced the Cousins-Highland result x/ 2.
Gary Feldman PHYSTAT 05 15 September 2005 22 Use of Profile Likelihood Wolfgang Rolke presented a talk on eliminating the nuisance parameters via profile likelihood, but with the Neyman construction replaced by the - lnL hill-climbing approximation. This is also what MINUIT does. The coverage is good with some minor undercoverage. Cranmer also considers this method.
Gary Feldman PHYSTAT 05 15 September 2005 23 Full Neyman Constructions l Both Giovanni Punzi and Kyle Cranmer attempted full Neyman constructions for both signal and nuisance parameters. l I don’t recommend you try this at home for the following reasons: l The ordering principle is not unique. Both Punzi and Cranmer ran into some problems. l The technique is not feasible for more than a few nuisance parameters. l It is unnecessary since removing the nuisance parameters through profile likelihood works quite well.
Gary Feldman PHYSTAT 05 15 September 2005 24 Cranmer’s (Revised) Conclusions l In Cranmer’s talk, he had an unexpected result for the coverage of Rolke’s method (“profile”). He did in fact have an error and it is corrected here:
Gary Feldman PHYSTAT 05 15 September 2005 25 Final Comments on Nuisance Parameters l My preference is to eliminate at least the major nuisance parameters through profile likelihood and then do a LR Neyman construction. It is straightforward and has excellent coverage properties. l However, whatever method you choose, you should check the coverage of the method. l Cranmer makes the point that if you can check the coverage, you can also do a Neyman construction. I don’t completely agree, but it is worth considering.