1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland.

1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland http://tmva.sf.net

2 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction Goals of the Workshop Introduction to multivariate classification and regression with TMVA Pedagogical talks on various multivariate methods (morning) Talks from users (14:00) Tutorial (15:30)

3 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA ROOT: is the analysis framework used by most (HEP)-physicists Idea: rather than just implementing new MVA techniques and making them available in ROOT: ﹣ Have one common platform / interface for all MVA methods ﹣ Have common data pre-processing capabilities ﹣ Train and test all classifiers on same data sample and evaluate consistently ﹣ Provide common analysis (ROOT scripts) and application framework ﹣ Provide access with and without ROOT, through macros, C++ executables or python

4 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA TMVA started in 2006 on the Sourceforge development platformSourceforge 6 core developers, 21 contributors so far TMVA is written in C++, and relies on ROOT functionality Since ROOT 5.15 / TMVA v3.7.2 TMVA is part of ROOT, and developed directly in ROOT SVNROOT SVN ﹣ Continue to maintain primary tmva-users mailing list on Sourceforgetmva-users ﹣ New TMVA versions also published as downloadable tgz files on Sourceforge ﹣ For bug reports, use ROOT SavannahROOT Savannah

5 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction Simulated Higgs Event in CMS Higgs event in an LHC proton–proton collision at high luminosity (together with ~24 other inelastic events) 7 TeV LHC 2010 Such events occur only in a tiny fraction of the proton-proton collisions O(10 −10 )

6 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction Event Classification in HEP Most HEP analyses require discrimination of signal from background: ﹣ Event level (Higgs searches, …) ﹣ Cone level (Tau-vs-jet reconstruction, …) ﹣ Track level (particle identification, …) ﹣ Object level (flavour tagging, …) ﹣ Parameter estimation (significance, mass, CP violation in B system, …) The multivariate input information used for this has various sources ﹣ Kinematic variables (masses, momenta, decay angles, …) ﹣ Event properties (jet/lepton multiplicity, sum of charges, …) ﹣ Event shape (sphericity, Fox-Wolfram moments, …) ﹣ Detector response (silicon hits, dE/dx, Cherenkov angle, shower profiles, muon hits, …) Traditionally few powerful input variables were combined. New methods allow to use up to 100 and more variables w/o loss of classification power e.g. MiniBooNE: NIMA 543 (2005), or D0 single top: Phys.Rev. D78, 012005 (2008)

7 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction Event Classification Suppose data sample with two types of events: H 0, H 1 ﹣ We have found discriminating input variables x 1, x 2, … ﹣ What decision boundary should we use to select events of type H 1 ? Linear boundary? A nonlinear one? Rectangular cuts? Low variance (stable), high bias methodsHigh variance, small bias methods

8 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction Event Classification Suppose data sample with two types of events: H 0, H 1 ﹣ We have found discriminating input variables x 1, x 2, … ﹣ What decision boundary should we use to select events of type H 1 ? Linear boundary? A nonlinear one? Rectangular cuts? How can we decide this in an optimal way ?  Let the machine learn it !

9 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction Parameter Regression How to estimate a “functional behaviour” from a set of measurements? ﹣ Energy deposit in a the calorimeter, distance between overlapping photons, … ﹣ Entry location of a particle in the calorimeter or on a silicon pad, … x f(x) x x Linear function ? A non-linear one ? Constant ? Looks trivial? What if we have many input variables?

10 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction Multivariate Event Classification

11 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction Each event, Signal or Background, has “D” measured variables.  D “feature space” Find a mapping from D-dimensional input-observable = ”feature” space to one dimensional output  class labels

12 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction Each event, Signal or Background, has “D” measured variables.  D “feature space” y(x)y(x)  Most general form y = y(x); x in  D x = {x 1,….,x D }: input variables y(x): R n  R: Plotting the resulting y(x) values: Find a mapping from D-dimensional input-observable = ”feature” space to one dimensional output  class labels

13 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction Each event, Signal or Background, has “D” measured variables.  D “feature space”  y(x): R n  R: y(x)y(x) y(x): “test statistic” in D-dimensional space of input variables Distributions of y(x): PDF S (y) and PDF B (y) Overlap of PDF S (y) and PDF B (y) affects separation power, purity > cut: signal = cut: decision boundary < cut: background y(x): Used to set the selection cut ! y(x) = const: surface defining the decision boundary  Efficiency and purity y(B)  0, y(S)  1

14 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction Multi-Class Classification Binary classification: two classes, “signal” and “background” SignalBackground

15 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction Class 1 Class 2 Class 3 Class 5 Class 6 Class 4 Multi-Class Classification Multi-class classification – natural extension for many classifiers

16 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction P(Class=C|x) (or simply P(C|x)) : probability that the event class is of type C, given the measured observables x = {x 1,….,x D }  y(x) Prior probability to observe an event of “class C”, i.e., the relative abundance of “signal” versus “background” Overall probability density to observe the actual measurement y(x), i.e., Probability density distribution according to the measurements x and the given mapping function Posterior probability Event Classification

17 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction AND Minimum error in misclassification if C chosen such that it has maximum P(C|y)  To select S(ignal) over B(ackground), place decision on: Likelihood ratio as discriminating function y(x) Prior odds ratio of choosing a signal event (relative probability of signal vs. bkg) “c” determines efficiency and purity x = {x 1,….,x D }: measured observables y = y(x) [ Or any monotonic function of P(S|y) / P(B|y) ] Posterior odds ratio Bayes Optimal Classification

18 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction Trying to select signal events: (i.e. try to disprove the null-hypothesis stating it were “only” a background event) Type-2 error: Fail to identify an event from Class C as such (reject a hypothesis although it would have been true) (fail to reject the null-hypothesis/accept null hypothesis although it is false)  loss of efficiency (in selecting signal events) Decide to treat an event as “Signal” or “Background” Signal Back- ground Signal Type-2 error Back- ground Type-1 error Accept as: Truly is: Type-1 error: Classify event as Class C even though it is not (accept a hypothesis although it is not true) (reject the null-hypothesis although it would have been the correct one)  loss of purity (in the selection of signal events) Any Decision Involves a Risk Significance α:Type-1 error rate: (=p-value): α = background selection “efficiency” Size β: Type-2 error rate: Power: 1  β = signal selection efficiency should be small ! “A”: region where event is called signal

19 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction Neyman-Peason: The Likelihood ratio used as “selection criterion” y(x) gives for each selection efficiency the best possible background rejection. i.e. it maximises the area under the “Receiver Operation Characteristics” (ROC) curve Neyman-Peason: The Likelihood ratio used as “selection criterion” y(x) gives for each selection efficiency the best possible background rejection. i.e. it maximises the area under the “Receiver Operation Characteristics” (ROC) curve  Varying y(x) > “cut” moves the working point (efficiency and purity) along the ROC curve How to choose “cut”?  need to know prior probabilities (S, B abundances) ﹣ Measurement of signal cross section: maximum of S/√(S+B) or equiv. √(  ·p) ﹣ Discovery of a signal :maximum of S/√(B) ﹣ Precision measurement:high purity (p) ﹣ Trigger selection:high efficiency (  (sometimes high background rejection) 0 1 1 0 1-  backgr.  signal random guessing good classification better classification “limit” in ROC curve given by likelihood ratio Type-1 error small Type-2 error large Type-1 error large Type-2 error small Neyman-Pearson Lemma

20 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction Unfortunately, the true probability densities functions are typically unknown:  Neyman-Pearson’s lemma doesn’t really help us…  Supervised (machine) learning * Hyperplane in the strict sense goes through the origin. Here is meant an “affine set” to be precise. Use MC simulation, or more generally: set of known (already classified) “events” Use these “training” events to: Try to estimate the functional form of P(x|C) from which the likelihood ratio can be obtained e.g. D-dimensional histogram, Kernel densitiy estimators, MC-based matrix-element methods, … Find a “discrimination function” y(x) and corresponding decision boundary (i.e. hyperplane* in the “feature space”: y(x) = const) that optimally separates signal from background e.g. Linear Discriminator, Neural Networks, Boosted Decision, … Realistic Event Classification

21 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction Unfortunately, the true probability densities functions are typically unknown:  Neyman-Pearson’s lemma doesn’t really help us…  Supervised (machine) learning * Hyperplane in the strict sense goes through the origin. Here is meant an “affine set” to be precise. Use MC simulation, or more generally: set of known (already classified) “events” Use these “training” events to: Try to estimate the functional form of P(x|C) from which the likelihood ratio can be obtained e.g. D-dimensional histogram, Kernel densitiy estimators, MC-based matrix-element methods, … Find a “discrimination function” y(x) and corresponding decision boundary (i.e. hyperplane* in the “feature space”: y(x) = const) that optimally separates signal from background e.g. Linear Discriminator, Neural Networks, … Realistic Event Classification

22 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction Multivariate Analysis Methods in TMVA Examples for classifiers and regression methods – Rectangular cut optimisation – Projective and multidimensional likelihood estimator – k-Nearest Neighbor algorithm – Fisher and H-Matrix discriminants – Function discriminants – Artificial neural networks – Boosted decision trees – RuleFit – Support Vector Machine Preprocessing methods: – Decorrelation, Principal Value Decomposition, Gaussianisation Examples for synthesis methods: – Boosting, Categorisation (valid for all methods, and their combinations) Joerg Jan Helge Eckhard Peter

23 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction We have a Users Guide ! Available on http://tmva.sf.net TMVA Users Guide 142pp, incl. code examples arXiv physics/0703039

24 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction  T MVA tutorial U s i n g T M V A A typical TMVA analysis consists of two main steps: 1.Training phase: training, testing and evaluation of classifiers using data samples with known signal and background composition 2.Application phase: using selected trained classifiers to classify unknown data samples

25 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction  T MVA tutorial Code Flow for Training and Application

26 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction Can be ROOT scripts, C++ executables or python scripts (via PyROOT), or any other high-level language that interfaces with ROOT  T MVA tutorial Code Flow for Training and Application

27 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction  T MVA tutorial Strong Methods need Strong Evaluation A lot of evaluation information is already provided in the logging output of the training Simple GUIs provide access to evaluation plots and tools for single and multi-class classification and regression

28 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction  T MVA tutorial Involved Methods need Thorough Evaluation

29 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction  T MVA tutorial Involved Methods need Thorough Evaluation

30 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction  T MVA tutorial Involved Methods need Thorough Evaluation average no. of nodes before/after pruning: 4193 / 968

31 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction Join the tutorial this afternoon !!  T MVA tutorial

32 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction Two Wishes to Users :  T MVA tutorial

33 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction ﹣ A detector element may only exist in the barrel, but not in the endcaps ﹣ A variable may have different distributions in barrel, overlap, endcap regions Ignoring this dependence may reduce performance and creates correlations between variables, which must be learned by the classifier ﹣ Classifiers such as the projective likelihood, which do not account for correlations, significantly loose performance if the sub-populations are not separated Categorisation means splitting the data sample into categories defining disjoint data samples with the following (idealised) properties: ﹣ Events belonging to the same category are statistically indistinguishable ﹣ Events belonging to different categories have different properties Multivariate training samples often have distinct sub-populations of data

34 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction MethodCategory is Your Friend ! It provides fully transparent support for categorisation of your input data, applicable to any TMVA method  See Peter’s talk later today

35 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction The HEP community has already a lot of experience with MVA classification In particular for rare events searches, O(all) mature experiments use it They increase the experiment’s sensitivity, and may reduce systematic errors due to a smaller background component MVAs are not black boxes, but (possibly involved) R n  R mapping functions Should acquire more experience in HEP with multivariate regression Our calibration schemes are often still quite simple: linear or simple functions, look-up-table based, mostly depending on few variables (e.g., η, p T ) Non-linear multivariate regression may significantly boost calibration and corrections applied, in particular if it is possible to train from data Available since TMVA 4 for: LD, FDA, k-NN, PDERS, PDEFoam, MLP, BDT

36 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction S t a t u s & O u t l o o k 2-class classification supported by all methods Multi-class classification supported by MLP (NN), BDTG, FDA Single-target regression: PDE-RS, PDE-Foam, k-NN, LD, FDA, MLP, BDT Multi-target regression: PDE-Foam, k-NN, MLP All methods support categorised classification and generalised boosting Priority on to-do list for future releases ﹣ Automatic self-optimisation of parameter settings for all methods ﹣ For this: full support of cross validation ﹣ Increase support of multi-dimensional classification and regression ﹣ Individual improvements of methods (see, eg, MLP talk by Jan) ﹣ Introduction of unsupervised learning

37 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction Several similar data mining efforts with rising importance in most fields of science and industry Important for HEP: ﹣ Parallelised MVA training and evaluation pioneered by Cornelius package (BABAR) ﹣ Also frequently used: StatPatternRecognition package by I. Narsky (Cal Tech) ﹣ Many implementations of individual classifiers exist TMVA is open source software Use & redistribution of source permitted according to terms in BSD licenseBSD license Contributed to TMVA have: Andreas Hoecker (CERN, Switzerland), Jörg Stelzer (CERN, Switzerland), Peter Speckmayer (CERN, Switzerland), Jan Therhaag (Universität Bonn, Germany), Eckhard von Toerne (Universität Bonn, Germany), Helge Voss (MPI für Kernphysik Heidelberg, Germany), Moritz Backes (Geneva University, Switzerland), Tancredi Carli (CERN, Switzerland), Asen Christov (Universität Freiburg, Germany), Or Cohen (CERN, Switzerland and Weizmann, Israel), Krzysztof Danielowski (IFJ and AGH/UJ, Krakow, Poland), Dominik Dannheim (CERN, Switzerland), Sophie Henrot-Versille (LAL Orsay, France), Matthew Jachowski (Stanford University, USA), Kamil Kraszewski (IFJ and AGH/UJ, Krakow, Poland), Attila Krasznahorkay Jr. (CERN, Switzerland, and Manchester U., UK), Maciej Kruk (IFJ and AGH/UJ, Krakow, Poland), Yair Mahalalel (Tel Aviv University, Israel), Rustem Ospanov (University of Texas, USA), Xavier Prudent (LAPP Annecy, France), Arnaud Robert (LPNHE Paris, France), Doug Schouten (S. Fraser U., Canada), Fredrik Tegenfeldt (Iowa University, USA, until Aug 2007), Alexander Voigt (CERN, Switzerland), Kai Voss (University of Victoria, Canada), Marcin Wolter (IFJ PAN Krakow, Poland), Andrzej Zemla (IFJ PAN Krakow, Poland). Copyrights & Credits

38 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction Software packages for Multivariate Data Analysis/Classification: Individual classifier software: e.g. “JETNET” C.Peterson, T. Rognvaldsson, L.Loennblad, many, many other packages! “All inclusive” packages StatPatternRecognition: I.Narsky, arXiv: physics/0507143 http://www.hep.caltech.edu/~narsky/spr.html TMVA: Hoecker, Speckmayer, Stelzer, Therhaag, von Toerne, Voss, arXiv: physics/0703039 http://tmva.sf.nethttp://tmva.sf.net or every ROOT distribution WEKA: http://www.cs.waikato.ac.nz/ml/weka/http://www.cs.waikato.ac.nz/ml/weka/ Huge data analysis library available in “R”: http://www.r-project.org/http://www.r-project.org/ Literature: T. Hastie, R. Tibshirani, J. Friedman, “The Elements of Statistical Learning”, Springer 2001 C.M. Bishop, “Pattern Recognition and Machine Learning”, Springer 2006 Conferences: PHYSTAT, ACAT,… A few References

1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland.

Similar presentations

Presentation on theme: "1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland.

Similar presentations

Presentation on theme: "1 TMVA Workshop, 21 Jan 2011 Andreas Hoecker – Introduction TMVA Workshop – Introduction Andreas Hoecker (CERN) TMVA Workshop, 21 January 2011, CERN, Switzerland."— Presentation transcript:

Similar presentations

About project

Feedback