Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper1 Statistical Software In DØ The Good, the Bad and the Non-Existent Harrison B. Prosper.

Similar presentations


Presentation on theme: "Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper1 Statistical Software In DØ The Good, the Bad and the Non-Existent Harrison B. Prosper."— Presentation transcript:

1 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper1 Statistical Software In DØ The Good, the Bad and the Non-Existent Harrison B. Prosper Florida State University PHYSTAT Workshop 2005 15 August 2005

2 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper2 Outline  Analysis Example  Available Software  Wish List  Summary

3 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper3 Example - DØ Single Top Group – I

4 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper4 Example - DØ Single Top Group – II  Search for p+pbar → t + (q) + b + X  8 signal channels  7 background sources per signal channel  QCD, ttbar(lj), ttbar(ll), Wjj, Wbb, WW, WZ  Each data bin is the sum of  tb, tqb, QCD, ttbar(lj), ttbar(ll), Wjj, Wbb, WW, WZ

5 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper5 Example - DØ Single Top Group – III  Basic Statistical Quantity: Binned Likelihood  Goal  To measure  s and  t

6 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper6 Example – Statistical Problems – IV  Background Modeling  Model/data comparisons in multiple dimensions to determine region with “best” match.  Background events are generally weighted, for example, by the probability that it could contain a b-jet.  These “tag-rate functions” are the results of fits to 2 – 3 dimensional empirical densities.

7 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper7 Example – Statistical Problems – V  Discriminant Variable Selection  From a list of potentially useful variables, select the “best” sub-set.  Multivariate Analyses  Random Grid Search  Neural Networks  Decision Trees  Bayesian Neural Networks

8 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper8 Example – Statistical Problems – VI  Posterior Density Computation  Must marginalize over hundreds of variables (acceptances and background yields) and must do so taking into account known dependencies.  Analysis Validation  Ideally, the entire analysis is run repeatedly on fake data-sets to study its frequency behavior.

9 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper9 Available Software – I  Fitting  Minuit applied to Root histograms  PoissonGammaFit (more later!)  Multivariate Methods  RGSearch(a few incompatible versions)  Jetnet (v3.4) (with C++ binding  MLPfit(several versions)  oo_neural (OOP version of BP)

10 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper10 Available Software – II  Classifier (decision tree)  C2.4 (decision tree)  TerraFerma (misc. methods)  BNN(Bayesian NN)  Limit Setting  top_statistics(Bayes, CLs)  blimit(Bayes – more robust version of DØ web-calculator)

11 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper11 Available Software – III  Adaptive Numerical Integration  AdBayes(C++ binding of Alan Genz’s Fortran code)  Python Bindings  RGSearch, Jetnet, AdBayes, PoissonGammaFit, CLHEP, Coin, Root, etc.

12 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper12 PoissonGammaFit  Model  For each bin i we write the (mean) data count d i as a linear sum of N (mean) source counts  Likelihood for observed distribution D ={D i }

13 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper13 PoissonGammaFit – II  Bayesian Inference for Moments m r of p  Prior (given source counts A ji )

14 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper14 What’s Available?  C++ Class  PoissonGammaFit (vvdouble& A, vdouble&D, stringprior=“flat”, bool scale=true, inttotal=10000)  Methods  m = o.mean()  v = o.variance() vdouble= vector vvdouble = vector >

15 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper15 What’s Available? – II  Main Program Usage: pgammafit-h -f [hist-file-list (histfile.list)] -n [# of sampling points (10000)] -o [name of plot (pgammafit.gif)]  Uses  HistogramCache, PoissonGammaFit, Minuit

16 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper16

17 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper17 Bayesian Neural Networks y(x,w) x1x1 x2x2 u, a v, b w = (u, a, v, b) weights For binary (0,1) classification p(1|x) y(x,w) → p(1|x)

18 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper18 HT_AllJets_MinusBestJets Dots p(1|H T ) = H tqb /(H tqb +H Wbb ) H is a 1-D histogram Curves individual NNs y(H T, w n ) Black curve Bayesian Neural Networks – II

19 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper19

20 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper20 What’s Available?  Radford Neal’s Package  C-codes compiled and linked into a set of programs:  net-specSpecify network  data-specSpecify training data  net-genInitialize network  mc-specSpecify MCMC parameters  net-mcRun MCMC  net-displayDisplay network parameters  netwrite.pyWrite results to a C++ function

21 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper21 The Bad  It’s a Jungle Out There!  Difficult to express ideas clearly  Tools typically cannot be moved, easily, from one framework to another  No clear protocol for interface between heterogeneous data formats  No algebra of histograms  Histograms tightly coupled to their viewers: Use Root or die!

22 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper22 The Bad – II  Inadequate Support For:  Generating ensembles of observations, possibly with conditioning, to study bias, variance, coverage etc.  Assessing robustness with respect to likelihoods and prior densities  Studying different confidence limit procedures  Studying different optimization criteria

23 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper23 The Non-Existent  DØ has no (or inadequate) tools to:  Browse data in truly interesting ways  Perform goodness-of-fit tests that go beyond KS and χ 2  Construct Bayesian models, systematically  Perform sensitivity analyses, systematically  No domain-specific language

24 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper24 Wish List – I  Free At Last!  Statistical tool separate from, and independent of, the environment in which it might be used.  However, provide bindings for different environments/languages (R, Root, Ruby, Python, Java, etc.)  Less Is More!  Each statistical tool should encapsulate a single coherent statistical idea.

25 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper25 Wish List – II  Histograms  Histogram and histogram viewers should be independent of each other. (A sensible idea from Marc Paterno!)  Elegant algebra of histograms h = a*h 1 +b*h 2 /h 3 etc.  Powerful, intuitive tools for multi-dim. data exploration

26 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper26 Wish List – III  Likelihoods  Flexible method for reporting them; perhaps as swarms of points generated via MCMC?  Frequency Methods  Flexible ensemble generator, with easily extracted sub-ensembles  Flexible query of ensembles (to get coverage, error rates, variances, bias etc.)

27 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper27 Wish List – IV  Bayesian Methods  Flexible robustness studies (prior family, likelihood family etc.)  Multi-dimensional integration (adaptive and Markov Chain MC)  Domain Specific Language  No dereferencing, auto_ptr, dynamic_cast, pointers, templates etc. please,… we’re British!

28 Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper28 Summary  The Good  Many statistical tools are in use at DØ  A lot more needed – opportunity for creativity!  The Bad  Current tools are a reflection of non-interacting idiosyncratic minds!  The Non-Existent  Lack of a domain-specific language for expression of statistical ideas. I don’t want to think about pointers and const-correctness when I’m trying to think about mathematics.


Download ppt "Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper1 Statistical Software In DØ The Good, the Bad and the Non-Existent Harrison B. Prosper."

Similar presentations


Ads by Google