G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem 2Random variables and.

Slides:



Advertisements
Similar presentations
Slides from: Doug Gray, David Poole
Advertisements

Machine Learning Neural Networks
Lecture 14 – Neural Networks
Searching for Single Top Using Decision Trees G. Watts (UW) For the DØ Collaboration 5/13/2005 – APSNW Particles I.
8. Statistical tests 8.1 Hypotheses K. Desch – Statistical methods of data analysis SS10 Frequent problem: Decision making based on statistical information.
Sparse vs. Ensemble Approaches to Supervised Learning
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Statistical Data Analysis: Lecture 2 1Probability, Bayes’ theorem 2Random variables and.
Neural Networks Marco Loog.
Optimization of Signal Significance by Bagging Decision Trees Ilya Narsky, Caltech presented by Harrison Prosper.
G. Cowan 2007 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 3 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Back-Propagation Algorithm
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 14 page 1 Statistical Data Analysis: Lecture 14 1Probability, Bayes’ theorem 2Random variables and.
Sparse vs. Ensemble Approaches to Supervised Learning
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics CERN-FNAL Hadron.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan SUSSP65, St Andrews, August 2009 / Statistical Methods 3 page 1 Statistical Methods in Particle Physics Lecture 3: Multivariate Methods.
Artificial Neural Networks
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
Michigan REU Final Presentations, August 10, 2006Matt Jachowski 1 Multivariate Analysis, TMVA, and Artificial Neural Networks Matt Jachowski
G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 3: Multivariate Methods (II) 清华大学高能物理研究中心 2010 年 4 月 12—16.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;
G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 21 Topics in Statistical Data Analysis for HEP Lecture 2: Statistical Tests CERN.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
1 Glen Cowan Multivariate Statistical Methods in Particle Physics Machine Learning and Multivariate Statistical Methods in Particle Physics Glen Cowan.
G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Lecture 2 1 Probability Definition, Bayes’ theorem, probability densities and their properties,
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 31 Introduction to Statistics − Day 3 Lecture 1 Probability Random variables, probability.
G. Cowan iSTEP 2015, Jinan / Statistics for Particle Physics / Lecture 21 Statistical Methods for Particle Physics Lecture 2: multivariate methods iSTEP.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
פרקים נבחרים בפיסיקת החלקיקים אבנר סופר אביב
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools 1 Statistical Analysis Tools for Particle Physics IDPASC.
1 Introduction to Statistics − Day 2 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Statistical Data Analysis: Lecture 5 1Probability, Bayes’ theorem 2Random variables and.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 21 Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Single Top Quark Production at D0, L. Li (UC Riverside) EPS 2007, July Liang Li University of California, Riverside On Behalf of the DØ Collaboration.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Helge VossAdvanced Scientific Computing Workshop ETH Multivariate Methods of data analysis Helge Voss Advanced Scientific Computing Workshop ETH.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Comment on Event Quality Variables for Multivariate Analyses
Recent progress in multivariate methods for particle physics
Lectures on Statistics TRISEP School 27, 28 June 2016 Glen Cowan
Computing and Statistical Data Analysis Stat 5: Multivariate Methods
TRISEP 2016 / Statistics Lecture 2
Statistical Analysis Tools for Particle Physics
SUSSP65, St Andrews, August 2009 / Statistical Methods 3
Parametric Methods Berlin Chen, 2005 References:
Measurement of the Single Top Production Cross Section at CDF
Presentation transcript:

G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem 2Random variables and probability densities 3Expectation values, error propagation 4Catalogue of pdfs 5The Monte Carlo method 6Statistical tests: general concepts 7Test statistics, multivariate methods 8Goodness-of-fit tests 9Parameter estimation, maximum likelihood 10More maximum likelihood 11Method of least squares 12Interval estimation, setting limits 13Nuisance parameters, systematic uncertainties 14Examples of Bayesian approach

G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 2 Nonlinear test statistics The optimal decision boundary may not be a hyperplane, → nonlinear test statistic accept H0H0 H1H1 Multivariate statistical methods are a Big Industry: Particle Physics can benefit from progress in Machine Learning. Neural Networks, Support Vector Machines, Kernel density methods,...

G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 3 Introduction to neural networks Used in neurobiology, pattern recognition, financial forecasting,... Here, neural nets are just a type of test statistic. Suppose we take t(x) to have the form logistic sigmoid This is called the single-layer perceptron. s(·) is monotonic → equivalent to linear t(x)

G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 4 The multi-layer perceptron Generalize from one layer to the multilayer perceptron: The values of the nodes in the intermediate (hidden) layer are and the network output is given by weights (connection strengths)

G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 5 Neural network discussion Easy to generalize to arbitrary number of layers. Feed-forward net: values of a node depend only on earlier layers, usually only on previous layer (“network architecture”). More nodes → neural net gets closer to optimal t(x), but more parameters need to be determined. Parameters usually determined by minimizing an error function, where t (0), t (1) are target values, e.g., 0 and 1 for logistic sigmoid. Expectation values replaced by averages of training data (e.g. MC). In general training can be difficult; standard software available.

G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 6 Neural network example from LEP II Signal: e  e  → W  W  (often 4 well separated hadron jets) Background: e  e  → qqgg (4 less well separated hadron jets) ← input variables based on jet structure, event shape,... none by itself gives much separation. Neural network output does better... (Garrido, Juste and Martinez, ALEPH )

G. Cowan Statistical methods for particle physics page 7 Some issues with neural networks In the example with WW events, goal was to select these events so as to study properties of the W boson. Needed to avoid using input variables correlated to the properties we eventually wanted to study (not trivial). In principle a single hidden layer with an sufficiently large number of nodes can approximate arbitrarily well the optimal test variable (likelihood ratio). Usually start with relatively small number of nodes and increase until misclassification rate on validation data sample ceases to decrease. Usually MC training data is cheap -- problems with getting stuck in local minima, overtraining, etc., less important than concerns of systematic differences between the training data and Nature, and concerns about the ease of interpretation of the output.

G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 8 Probability Density Estimation (PDE) techniques See e.g. K. Cranmer, Kernel Estimation in High Energy Physics, CPC 136 (2001) 198; hep-ex/ ; T. Carli and B. Koblitz, A multi-variate discrimination technique based on range-searching, NIM A 501 (2003) 576; hep-ex/ Construct non-parametric estimators of the pdfs and use these to construct the likelihood ratio (n-dimensional histogram is a brute force example of this.) More clever estimation techniques can get this to work for (somewhat) higher dimension.

G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 9 Kernel-based PDE (KDE, Parzen window) Consider d dimensions, N training events, x 1,..., x N, estimate f (x) with Use e.g. Gaussian kernel: kernel bandwidth (smoothing parameter) Need to sum N terms to evaluate function (slow); faster algorithms only count events in vicinity of x (k-nearest neighbor, range search).

G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 10 Product of one-dimensional pdfs First rotate to uncorrelated variables, i.e., find matrix A such that forwe have Estimate the d-dimensional joint pdf as the product of 1-d pdfs, (here x decorrelated) This does not exploit non-linear features of the joint pdf, but simple and may be a good approximation in practical examples.

G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 11 Decision trees A training sample of signal and background data is repeatedly split by successive cuts on its input variables. Order in which variables used based on best separation between signal and background. Example by Mini-Boone, B. Roe et al., NIM A 543 (2005) 577 Iterate until stop criterion reached, based e.g. on purity, minimum number of events in a node. Resulting set of cuts is a ‘decision tree’. Tends to be sensitive to fluctuations in training sample.

G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 12 Boosted decision trees Boosting combines a number classifiers into a stronger one; improves stability with respect to fluctuations in input data. To use with decision trees, increase the weights of misclassified events and reconstruct the tree. Iterate → forest of trees (perhaps > 1000). For the mth tree, Define a score  m based on error rate of mth tree. Boosted tree = weighted sum of the trees: Algorithms: AdaBoost (Freund & Schapire),  -boost (Friedman).

G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 13 For all methods, need to check: Sensitivity to statistically unimportant variables (best to drop those that don’t provide discrimination); Level of smoothness in decision boundary (sensitivity to over-training) Given the test variable, next step is e.g., select n events and estimate a cross section of signal: Multivariate analysis discussion Now need to estimate systematic error... If e.g. training (MC) data ≠ Nature, test variable is not optimal, but not necessarily biased. But our estimates of background b and efficiencies would then be biased if based on MC. (True also for ‘simple cuts’.)

G. Cowan page 14 Overtraining training sample independent test sample If decision boundary is too flexible it will conform too closely to the training points → overtraining. Monitor by applying classifier to independent test sample.

G. Cowan page 15 Using classifier output for discovery y f(y)f(y) y N(y)N(y) Normalized to unity Normalized to expected number of events excess? signal background search region Discovery = number of events found in search region incompatible with background-only hypothesis. p-value of background-only hypothesis can depend crucially distribution f(y|b) in the "search region". y cut

G. Cowan SUSSP65, St Andrews, August 2009 / Statistical Methods 3 page 16 Single top quark production (CDF/D0) Top quark discovered in pairs, but SM predicts single top production. Use many inputs based on jet properties, particle i.d.,... signal (blue + green) Pair-produced tops are now a background process.

G. Cowan Statistical methods for particle physics page 17 Different classifiers for single top Also Naive Bayes and various approximations to likelihood ratio,.... Final combined result is statistically significant (>5  level) but not easy to understand classifier outputs.

G. Cowan Statistical methods for particle physics page 18

G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 19 Comparing multivariate methods (TMVA) Choose the best one!

G. Cowan Statistical methods for particle physics page 20

G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 21 Wrapping up lecture 7 We looked at statistical tests and related issues: discriminate between event types (hypotheses), determine selection efficiency, sample purity, etc. Some modern (and less modern) methods were mentioned: Fisher discriminants, neural networks, PDE, KDE, decision trees,... Next we will talk about goodness-of-fit tests: p-value expresses level of agreement between data and hypothesis