Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)

Slides:



Advertisements
Similar presentations
STOR 892 Object Oriented Data Analysis Radial Distance Weighted Discrimination Jie Xiong Advised by Prof. J.S. Marron Department of Statistics and Operations.
Advertisements

Component Analysis (Review)
Dimension reduction (1)
Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Started detailed look at PCA Reviewed linear algebra Today: More linear algebra Multivariate.
Affine-invariant Principal Components Charlie Brubaker and Santosh Vempala Georgia Tech School of Computer Science Algorithms and Randomness Center.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Dimensional reduction, PCA
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Linear Methods for Classification
Object Orie’d Data Analysis, Last Time Mildly Non-Euclidean Spaces Strongly Non-Euclidean Spaces –Tree spaces –No Tangent Plane Classification - Discrimination.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Basics of discriminant analysis
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Principles of the Global Positioning System Lecture 11 Prof. Thomas Herring Room A;
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Outline Separating Hyperplanes – Separable Case
Object Orie’d Data Analysis, Last Time OODA in Image Analysis –Landmarks, Boundary Rep ’ ns, Medial Rep ’ ns Mildly Non-Euclidean Spaces –M-rep data on.
Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed.
Object Orie’d Data Analysis, Last Time
Classification (Supervised Clustering) Naomi Altman Nov '06.
CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
Discriminant Function Analysis Basics Psy524 Andrew Ainsworth.
Object Orie’d Data Analysis, Last Time Distance Weighted Discrimination: Revisit microarray data Face Data Outcomes Data Simulation Comparison.
Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR.
1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina.
This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.
ECE 471/571 – Lecture 6 Dimensionality Reduction – Fisher’s Linear Discriminant 09/08/15.
Maximal Data Piling Visual similarity of & ? Can show (Ahn & Marron 2009), for d < n: I.e. directions are the same! How can this be? Note lengths are different.
Discriminant Analysis
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Object Orie’d Data Analysis, Last Time SiZer Analysis –Statistical Inference for Histograms & S.P.s Yeast Cell Cycle Data OODA in Image Analysis –Landmarks,
Participant Presentations Please Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Thurs., Early, Oct., Nov.,
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
PCA as Optimization (Cont.) Recall Toy Example Empirical (Sample) EigenVectors Theoretical Distribution & Eigenvectors Different!
ECE 471/571 – Lecture 3 Discriminant Function and Normal Density 08/27/15.
1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, III J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)
PCA Data Represent ’ n (Cont.). PCA Simulation Idea: given Mean Vector Eigenvectors Eigenvalues Simulate data from Corresponding Normal Distribution.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Chapter 12 – Discriminant Analysis
Return to Big Picture Main statistical goals of OODA:
Object Orie’d Data Analysis, Last Time
Background on Classification
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
CH 5: Multivariate Methods
Participant Presentations
Maximal Data Piling MDP in Increasing Dimensions:
Classification Discriminant Analysis
HDLSS Discrimination Mean Difference (Centroid) Method Same Data, Movie over dim’s.
Classification Discriminant Analysis
REMOTE SENSING Multispectral Image Classification
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Feature space tansformation methods
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Hairong Qi, Gonzalez Family Professor
Presentation transcript:

Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination) –Understanding 2+ populations Time Series of Data Objects –Chemical Spectra, Mortality Data “Vertical Integration” of Data Types

Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination) –Understanding 2+ populations Time Series of Data Objects –Chemical Spectra, Mortality Data “Vertical Integration” of Data Types Qing Feng: JIVE

Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination) –Understanding 2+ populations Time Series of Data Objects –Chemical Spectra, Mortality Data “Vertical Integration” of Data Types

Classification - Discrimination Background: Two Class (Binary) version: Using “ training data ” from Class +1 and Class -1 Develop a “ rule ” for assigning new data to a Class Canonical Example: Disease Diagnosis New Patients are “ Healthy ” or “ Ill ” Determine based on measurements

Fisher Linear Discrimination Simple way to find “ correct cov. adjustment ” : Individually transform subpopulations so “ spherical ” about their means For define

Fisher Linear Discrimination So (in orig ’ l space) have separ ’ ting hyperplane with: Normal vector: Intercept:

Classical Discrimination Above derivation of FLD was: Nonstandard Not in any textbooks(?) Nonparametric (don ’ t need Gaussian data) I.e. Used no probability distributions More Machine Learning than Statistics

Classical Discrimination FLD Likelihood View (cont.) Replacing, and by maximum likelihood estimates:, and Gives the likelihood ratio discrimination rule: Choose Class +1, when Same as above, so: FLD can be viewed as Likelihood Ratio Rule

Classical Discrimination Summary of FLD vs. GLR: Tilted Point Clouds Data –FLD good –GLR good Donut Data –FLD bad –GLR good X Data –FLD bad –GLR OK, not great Classical Conclusion: GLR generally better (will see a different answer for HDLSS data)

Classical Discrimination FLD Generalization II (Gen. I was GLR) Different prior probabilities Main idea: Give different weights to 2 classes I.e. assume not a priori equally likely Development is “ straightforward ” Modified likelihood Change intercept in FLD Won ’ t explore further here

Classical Discrimination FLD Generalization III Principal Discriminant Analysis Idea: FLD-like approach to > 2 classes

Classical Discrimination FLD Generalization III Principal Discriminant Analysis Idea: FLD-like approach to > 2 classes Assumption: Class covariance matrices are the same (similar) (but not Gaussian, same situation as for FLD)

Classical Discrimination Principal Discriminant Analysis (cont.) But PCA only works like Mean Difference, Expect can improve by taking covariance into account. Blind application of above ideas suggests eigen-analysis of:

Classical Discrimination Summary of Classical Ideas: Among “ Simple Methods ” –MD and FLD sometimes similar –Sometimes FLD better –So FLD is preferred Among Complicated Methods –GLR is best –So always use that? Caution: –Story changes for HDLSS settings

HDLSS Discrimination Main HDLSS issues: Sample Size, n < Dimension, d Singular covariance matrix So can ’ t use matrix inverse I.e. can ’ t standardize (sphere) the data (requires root inverse covariance) Can ’ t do classical multivariate analysis

HDLSS Discrimination Add a 3 rd Dimension (noise) Project on 2-d subspace generated by optimal dir ’ n & by FLD dir ’ n

HDLSS Discrimination Movie Through Increasing Dimensions

HDLSS Discrimination FLD in Increasing Dimensions: Low dimensions (d = 2-9): –Visually good separation –Small angle between FLD and Optimal –Good generalizability Medium Dimensions (d = 10-26): –Visual separation too good?!? –Larger angle between FLD and Optimal –Worse generalizability –Feel effect of sampling noise

HDLSS Discrimination FLD in Increasing Dimensions: High Dimensions (d=27-37): –Much worse angle –Very poor generalizability –But very small within class variation –Poor separation between classes –Large separation / variation ratio

HDLSS Discrimination FLD in Increasing Dimensions: At HDLSS Boundary (d=38): –38 = degrees of freedom (need to estimate 2 class means) –Within class variation = 0 ?!? –Data pile up, on just two points –Perfect separation / variation ratio? –But only feels microscopic noise aspects So likely not generalizable –Angle to optimal very large

HDLSS Discrimination FLD in Increasing Dimensions: Just beyond HDLSS boundary (d=39-70): –Improves with higher dimension?!? –Angle gets better –Improving generalizability? –More noise helps classification?!?

HDLSS Discrimination FLD in Increasing Dimensions: Far beyond HDLSS boun ’ ry (d= ): –Quality degrades –Projections look terrible (populations overlap) –And Generalizability falls apart, as well –Math ’ s worked out by Bickel & Levina (2004) –Problem is estimation of d x d covariance matrix

HDLSS Discrimination Simple Solution: Mean Difference (Centroid) Method Recall not classically recommended –Usually no better than FLD –Sometimes worse But avoids estimation of covariance Means are very stable Don ’ t feel HDLSS problem

HDLSS Discrimination Mean Difference (Centroid) Method Far more stable over dimensions Because is likelihood ratio solution (for known variance - Gaussians) Doesn ’ t feel HDLSS boundary Eventually becomes too good?!? Widening gap between clusters?!? Careful: angle to optimal grows So lose generalizability (since noise inc’s) HDLSS data present some odd effects …