Multivariate Analysis A Unified Perspective

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
An Introduction of Support Vector Machine
Support Vector Machines
Pattern Recognition and Machine Learning
x – independent variable (input)
Principal Component Analysis
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Dimensional reduction, PCA
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Machine Learning CMPT 726 Simon Fraser University
Independent Component Analysis (ICA) and Factor Analysis (FA)
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Machine Learning CS 165B Spring Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
Outline Separating Hyperplanes – Separable Case
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:
G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 3: Multivariate Methods (II) 清华大学高能物理研究中心 2010 年 4 月 12—16.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
An Introduction to Support Vector Machines (M. Law)
Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 24 Nov 2, 2005 Nanjing University of Science & Technology.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Principal Manifolds and Probabilistic Subspaces for Visual Recognition Baback Moghaddam TPAMI, June John Galeotti Advanced Perception February 12,
Linear Models for Classification
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Multivariate Methods in Particle Physics Today and Tomorrow Harrison B. Prosper Florida State University 5 November, 2008 ACAT 08, Erice, Sicily.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
From Small-N to Large Harrison B. Prosper SCMA IV, June Bayesian Methods in Particle Physics: From Small-N to Large Harrison B. Prosper Florida State.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Multivariate Discriminants Lectures 4.1, 4.2 Harrison B. Prosper Florida State University INFN School of Statistics 2013 Vietri sul Mare (Salerno, Italy)
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
CS 9633 Machine Learning Support Vector Machines
Data Transformation: Normalization
LECTURE 11: Advanced Discriminant Analysis
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
Multivariate Analysis Past, Present and Future
Mixture of SVMs for Face Class Modeling
Principal Component Analysis (PCA)
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Computing and Statistical Data Analysis Stat 5: Multivariate Methods
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Parametric Methods Berlin Chen, 2005 References:
Hairong Qi, Gonzalez Family Professor
Presentation transcript:

Multivariate Analysis A Unified Perspective Harrison B. Prosper Florida State University Advanced Statistical Techniques in Particle Physics Durham, UK, 20 March 2002 Multivariate Analysis Harrison B. Prosper Durham, UK 2002 Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 Outline Introduction Some Multivariate Methods Fisher Linear Discriminant (FLD) Principal Component Analysis (PCA) Independent Component Analysis (ICA) Self Organizing Map (SOM) Random Grid Search (RGS) Probability Density Estimation (PDE) Artificial Neural Network (ANN) Support Vector Machine (SVM) Comments Summary Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 Introduction – i Multivariate analysis is hard! Our mathematical intuition based on analysis in one dimension often fails rather badly for spaces of very high dimension. One should distinguish the problem to be solved from the algorithm to solve it. Typically, the problems to be solved, when viewed with sufficient detachment, are relatively few in number whereas algorithms to solve them are invented every day. Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 Introduction – ii So why bother with multivariate analysis? Because: The variables we use to describe events are usually statistically dependent. Therefore, the N-d density of the variables contains more information than is contained in the set of 1-d marginal densities fi(xi). This extra information may be useful Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 Dzero 1995 Top Discovery Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 Introduction - iii Problems that may benefit from multivariate analysis: Signal to background discrimination Variable selection (e.g., to give maximum signal/background discrimination) Dimensionality reduction of the feature space Finding regions of interest in the data Simplifying optimization (by ) Model comparison Measuring stuff (e.g., tanb in SUSY) Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Fisher Linear Discriminant Purpose Signal/background discrimination g is a Gaussian Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Principal Component Analysis Purpose Reduce dimensionality of data x2 x1 di 1st principal axis 2nd principal axis Multivariate Analysis Harrison B. Prosper Durham, UK 2002

PCA algorithm in practice Transform from X = (x1,..xN)T to U = (u1,..uN)T in which lowest order correlations are absent. Compute Cov(X) Compute its eigenvalues li and eigenvectors vi Construct matrix T = Col(vi)T U = TX Typically, one eliminates ui with smallest amount of variation Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Independent Component Analysis Purpose Find statistically independent variables. Dimensionality reduction Basic Idea Assume X = (x1,..,xN)T is a linear sum X = AS of independent sources S = (s1,..,sN)T. Both A, the mixing matrix, and S are unknown. Find a de-mixing matrix T such that the components of U = TX are statistically independent Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 ICA-Algorithm Given two densities f(U) and g(U) one measure of their “closeness” is the Kullback-Leibler divergence which is zero if, and only if, f(U) = g(U). We set and minimize K( f | g) (now called the mutual information) with respect to the de-mixing matrix T. Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 Self Organizing Map Purpose Find regions of interest in data; that is, clusters. Summarize data Basic Idea (Kohonen, 1988) Map each of K feature vectors X = (x1,..,xN)T into one of M regions of interest defined by the vector wm so that all X mapped to a given wm are closer to it than to all remaining wm. Basically, perform a coarse-graining of the feature space. Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 Grid Search Purpose: Signal/Background discrimination Apply cuts at each grid point We refer to as a cut-point Number of cut-points ~ NbinNdim Multivariate Analysis Harrison B. Prosper Durham, UK 2002 Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 Random Grid Search Take each point of the signal class as a cut-point Signal fraction Background fraction 1 Ntot = # events before cuts Ncut = # events after cuts Fraction = Ncut/Ntot H.B.P. et al, Proceedings, CHEP 1995 Multivariate Analysis Harrison B. Prosper Durham, UK 2002 Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Probability Density Estimation Purpose Signal/background discrimination Parameter estimation Basic Idea Parzen Estimation (1960s) Mixtures Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Artificial Neural Networks Purpose Signal/background discrimination Parameter estimation Function estimation Density estimation Basic Idea Encode mapping (Kolmogorov, 1950s). Using a set of 1-D functions. Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 Feedforward Networks ò f(a) a Input nodes Hidden nodes Output node Multivariate Analysis Harrison B. Prosper Durham, UK 2002 Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 ANN- Algorithm Minimize the empirical risk function with respect to w Solution (for large N) If t(x) = kd[1-I(x)], where I(x) = 1 if x is of class k, 0 otherwise D.W. Ruck et al., IEEE Trans. Neural Networks 1(4), 296-298 (1990) E.A. Wan, IEEE Trans. Neural Networks 1(4), 303-305 (1990) Multivariate Analysis Harrison B. Prosper Durham, UK 2002 Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Support Vector Machines Purpose Signal/background discrimination Basic Idea Data that are non-separable in N-dimensions have a higher chance of being separable if mapped into a space of higher dimension Use a linear discriminant to partition the high dimensional feature space. Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 SVM – Kernel Trick Or how to cope with a possibly infinite number of parameters! z1 z2 z3 y = +1 x1 x2 y = -1 Try different because mapping unknown! Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 Comments – i Every classification task tries to solves the same fundamental problem, which is: After adequately pre-processing the data …find a good, and practical, approximation to the Bayes decision rule: Given X, if P(S|X) > P(B|X) , choose hypothesis S otherwise choose B. If we knew the densities p(X|S) and p(X|B) and the priors p(S) and p(B) we could compute the Bayes Discriminant Function (BDF): D(X) = P(S|X)/P(B|X) Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 Comments – ii The Fisher discriminant (FLD), random grid search (RGS), probability density estimation (PDE), neural network (ANN) and support vector machine (SVM) are simply different algorithms to approximate the Bayes discriminant function D(X), or a function thereof. It follows, therefore, that if a method is already close to the Bayes limit, then no other method, however sophisticated, can be expected to yield dramatic improvements. Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 Summary Multivariate analysis is hard, but useful if it is important to extract as much information from the data as possible. For classification problems, the common methods provide different approximations to the Bayes discriminant. There is considerably empirical evidence that, as yet, no uniformly most powerful method exists. Therefore, be wary of claims to the contrary! Multivariate Analysis Harrison B. Prosper Durham, UK 2002 Multivariate Analysis Harrison B. Prosper Durham, UK 2002