Multivariate Analysis Past, Present and Future

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Dimension reduction (1)
Chapter 4: Linear Models for Classification
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
8. Statistical tests 8.1 Hypotheses K. Desch – Statistical methods of data analysis SS10 Frequent problem: Decision making based on statistical information.
Optimization of Signal Significance by Bagging Decision Trees Ilya Narsky, Caltech presented by Harrison Prosper.
Machine Learning CMPT 726 Simon Fraser University
Multivariate Analysis A Unified Perspective
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:
G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 3: Multivariate Methods (II) 清华大学高能物理研究中心 2010 年 4 月 12—16.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
1 E. Fatemizadeh Statistical Pattern Recognition.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
INTRODUCTION TO Machine Learning 3rd Edition
Linear Models for Classification
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Experience from Searches at the Tevatron Harrison B. Prosper Florida State University 18 January, 2011 PHYSTAT 2011 CERN.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
1 Statistics & R, TiP, 2011/12 Neural Networks  Technique for discrimination & regression problems  More mathematical theoretical foundation  Works.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Machine Learning 5. Parametric Methods.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Confidence Intervals Lecture 2 First ICFA Instrumentation School/Workshop At Morelia, Mexico, November 18-29, 2002 Harrison B. Prosper Florida State University.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
CS 9633 Machine Learning Support Vector Machines
Matt Gormley Lecture 3 September 7, 2016
Chapter 3: Maximum-Likelihood Parameter Estimation
Deep Feedforward Networks
LECTURE 11: Advanced Discriminant Analysis
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
CH 5: Multivariate Methods
Maximum Likelihood Estimation
Clustering (3) Center-based algorithms Fuzzy k-means
Classification Discriminant Analysis
Classification Discriminant Analysis
Predictive Learning from Data
Multidimensional Integration Part I
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Computing and Statistical Data Analysis Stat 5: Multivariate Methods
Generally Discriminant Analysis
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Linear Discrimination
Hairong Qi, Gonzalez Family Professor
Probabilistic Surrogate Models
Presentation transcript:

Multivariate Analysis Past, Present and Future Harrison B. Prosper Florida State University PHYSTAT 2003 10 September 2003 Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper Outline Introduction Historical Note Current Practice Issues Summary Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper Introduction Data are invariably multivariate Particle physics (h, f, E, f) Astrophysics (θ, f, E, t) Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

Introduction – II A Textbook Example Objects Jet 1 (b) 3 Jet 2 3 Jet 3 3 Jet 4 (b) 3 Positron 3 Neutrino 2 17 Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper Introduction – III Astrophysics/Particle physics: Similarities Events Interesting events occur at random Poisson processes Backgrounds are important Experimental response functions Huge datasets Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper Introduction – IV Differences In particle physics we control when events occur and under what conditions We have detailed predictions of the relative frequency of various outcomes Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

Introduction – V All we do is Count! Our experiments are ideal Bernoulli trials At Fermilab, each collision, that is, trial, is conducted the same way every 400ns de Finetti’s analysis of exchangeable trials is an accurate model of what we do Time → Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper Introduction – VI Typical analysis tasks Data Compression Clustering and cluster characterization Classification/Discrimination Estimation Model selection/Hypothesis testing Optimization Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper Historical Note Karl Pearson (1857 – 1936) R.A. Fisher (1890 – 1962) P.C. Mahalanobis (1893 – 1972) Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

Historical Note – Iris Data Iris Versicolor Iris Sotosa R.A. Fisher, The Use of Multiple Measurements in Taxonomic Problems, Annals of Eugenics, v. 7, p. 179-188 (1936) Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper Iris Data Variables X1 Sepal length X2 Sepal width X3 Petal length X4 Petal width “What linear function of the four measurements will maximize the ratio of the difference between the specific means to the standard deviations within species?” R.A. Fisher Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

Fisher Linear Discriminant (1936) Solution: Which is the same, within a constant, as Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

Current Practice in Particle Physics Reducing number of variables Principal Component Analysis (PCA) Discrimination/Classification Fisher Linear Discriminant (FLD) Random Grid Search (RGS) Feedforward Neural Network (FNN) Kernel Density Estimation (KDE) Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper Current Practice – II Parameter Estimation Maximum Likelihood (ML) Bayesian (KDE and analytical methods) e.g., see talk by Florencia Canelli (12A) Weighting Usually 0, 1, referred to as “cuts” Sometimes use the R. Barlow method Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper Cuts (0, 1 weights) S = B = Points that lie below the cuts are “cut out” 1 We refer to (x0, y0) as a cut-point Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper Grid Search S = B = Apply cuts at each grid point compute some measure of their effectiveness and choose most effective cuts Curse of dimensionality: number of cut-points ~ NbinNdim Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper Random Grid Search Take each point of the signal class as a cut-point Signal fraction Background fraction 1 y n = # events in sample k = # events after cuts fraction = n/k x H.B.P. et al, Proceedings, CHEP 1995 Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Example: DØ Top Discovery (1995) Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

Optimal Discrimination r(x,y) = constant defines the optimal decision boundary Bayes Discriminant Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

FeedForward Neural Networks Applications Discrimination Parameter estimation Function and density estimation Basic Idea Encode mapping (Kolmogorov, 1950s). using a set of 1-D functions. Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

Example: DØ Search for LeptoQuarks LQ g q LQ Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper Issues Method choice Life is short and data finite; so how should one choose a method? Model complexity How to reduce dimensionality of data, while minimizing loss of “information”? How many model parameters? How should one avoid over-fitting? Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper Issues – I I Model robustness Is a cut on a multivariate discriminant necessarily more sensitive to modeling errors than a cut on each of its input variables? What is a practical, but useful, way to assess sensitivity to modeling errors and robustness with respect to assumptions? Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper Issues - III Accuracy of predictions How should one place “error bars” on multivariate-based results? Is a Bayesian approach useful? Goodness of fit How can this be done in multiple dimensions? Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper Summary After ~ 80 years of effort we have many powerful methods of analysis A few of which are now used routinely in physics analyses The most pressing need is to understand some issues better so that when the data tsunami strikes we can respond sensibly Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

FNN – Probabilistic Interpretation Minimize the empirical risk function with respect to w Solution (for large N) If t(x) = kd[1-I(x)], where I(x) = 1 if x is of class k, 0 otherwise D.W. Ruck et al., IEEE Trans. Neural Networks 1(4), 296-298 (1990) E.A. Wan, IEEE Trans. Neural Networks 1(4), 303-305 (1990) Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper Self Organizing Map Basic Idea (Kohonen, 1988) Map each of K feature vectors X = (x1,..,xN)T into one of M regions of interest defined by the vector wm so that all X mapped to a given wm are closer to it than to all remaining wm. Basically, perform a coarse-graining of the feature space. Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

Support Vector Machines Basic Idea Data that are non-separable in N-dimensions have a higher chance of being separable if mapped into a space of higher dimension Use a linear discriminant to partition the high dimensional feature space. Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

Independent Component Analysis Basic Idea Assume X = (x1,..,xN)T is a linear sum X = AS of independent sources S = (s1,..,sN)T. Both A, the mixing matrix, and S are unknown. Find a de-mixing matrix T such that the components of U = TX are statistically independent Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper

Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper