Object Orie’d Data Analysis, Last Time SiZer Analysis –Statistical Inference for Histograms & S.P.s Yeast Cell Cycle Data OODA in Image Analysis –Landmarks,

Slides:



Advertisements
Similar presentations
Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.
Advertisements

Component Analysis (Review)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Independent Component Analysis Personal Viewpoint: Directions that maximize independence Motivating Context: Signal Processing “Blind Source Separation”
Machine learning continued Image source:
Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Started detailed look at PCA Reviewed linear algebra Today: More linear algebra Multivariate.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Object Orie’d Data Analysis, Last Time Mildly Non-Euclidean Spaces Strongly Non-Euclidean Spaces –Tree spaces –No Tangent Plane Classification - Discrimination.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
Radial Basis Function (RBF) Networks
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob.
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Outline Separating Hyperplanes – Separable Case
Object Orie’d Data Analysis, Last Time OODA in Image Analysis –Landmarks, Boundary Rep ’ ns, Medial Rep ’ ns Mildly Non-Euclidean Spaces –M-rep data on.
Object Orie’d Data Analysis, Last Time
Classification (Supervised Clustering) Naomi Altman Nov '06.
Machine Vision for Robots
Principles of Pattern Recognition
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)
A Challenging Example Male Pelvis –Bladder – Prostate – Rectum.
Robust PCA Robust PCA 3: Spherical PCA. Robust PCA.
Classification Heejune Ahn SeoulTech Last updated May. 03.
1 E. Fatemizadeh Statistical Pattern Recognition.
Object Orie’d Data Analysis, Last Time Discrimination for manifold data (Sen) –Simple Tangent plane SVM –Iterated TANgent plane SVM –Manifold SVM Interesting.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 24 Nov 2, 2005 Nanjing University of Science & Technology.
Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR.
1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina.
This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.
Object Orie’d Data Analysis, Last Time SiZer Analysis –Zooming version, -- Dependent version –Mass flux data, -- Cell cycle data Image Analysis –1 st Generation.
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
Maximal Data Piling Visual similarity of & ? Can show (Ahn & Marron 2009), for d < n: I.e. directions are the same! How can this be? Note lengths are different.
Common Property of Shape Data Objects: Natural Feature Space is Curved I.e. a Manifold (from Differential Geometry) Shapes As Data Objects.
Lecture 2: Statistical learning primer for biologists
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Participant Presentations Please Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Thurs., Early, Oct., Nov.,
1 UNC, Stat & OR U. C. Davis, F. R. G. Workshop Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research, University of North.
1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, I J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina.
Object Orie’d Data Analysis, Last Time PCA Redistribution of Energy - ANOVA PCA Data Representation PCA Simulation Alternate PCA Computation Primal – Dual.
PCA as Optimization (Cont.) Recall Toy Example Empirical (Sample) EigenVectors Theoretical Distribution & Eigenvectors Different!
Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)
PCA Data Represent ’ n (Cont.). PCA Simulation Idea: given Mean Vector Eigenvectors Eigenvalues Simulate data from Corresponding Normal Distribution.
SigClust Statistical Significance of Clusters in HDLSS Data When is a cluster “really there”? Liu et al (2007), Huang et al (2014)
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Landmark Based Shapes As Data Objects
Return to Big Picture Main statistical goals of OODA:
Object Orie’d Data Analysis, Last Time
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
LECTURE 10: DISCRIMINANT ANALYSIS
CH 5: Multivariate Methods
Landmark Based Shape Analysis
Learning with information of features
Probabilistic Models with Latent Variables
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Recognition and Image Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Hairong Qi, Gonzalez Family Professor
Test #1 Thursday September 20th
ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.
Hairong Qi, Gonzalez Family Professor
Presentation transcript:

Object Orie’d Data Analysis, Last Time SiZer Analysis –Statistical Inference for Histograms & S.P.s Yeast Cell Cycle Data OODA in Image Analysis –Landmarks, Boundary Rep ’ ns, Medial Rep ’ ns Mildly Non-Euclidean Spaces –M-rep data on manifolds

Mildly Non-Euclidean Spaces Statistical Analysis of M-rep Data Recall: Many direct products of: Locations Radii Angles I.e. points on smooth manifold Data in non-Euclidean Space But only mildly non-Euclidean

Mildly Non-Euclidean Spaces Statistical Analysis of M-rep Data Recall: Many direct products of: Locations Radii Angles Mathematical Summarization: Lie Groups and/or symmetric spaces

Mildly Non-Euclidean Spaces Frech é t mean of numbers: Frech é t mean in Euclidean Space: Frech é t mean on a manifold: Replace Euclidean by Geodesic

Mildly Non-Euclidean Spaces Useful View of Manifold Data: Tangent Space Center: Frech é t Mean Reason for terminology “ mildly non Euclidean ”

Mildly Non-Euclidean Spaces Analog of PCA? Principal geodesics: Replace line that best fits data By geodesic that best fits the data Implemented as PCA in tangent space But mapped back to surface Fletcher (2004) Ja-Yeon Jeong will demo in: Bladder – Prostate – Rectum example

Mildly Non-Euclidean Spaces Interesting Open Problems: Fully geodesic PGA? –E.g. data “ just north of equator ” on sphere Gaussian Distribution on Manifold? Analog of Covariance? Simulation on Manifold?

Mildly Non-Euclidean Spaces Aside: There is a mathematical statistics literature on “ data on manifolds ” Ruymgaart (1989) Hendriks, Janssen & Ruymgaart (1992) Lee & Ruymgaart (1996) Kim (1998) Bhattacharya & Patrangenaru (2003) …

Strongly Non-Euclidean Spaces Trees as Data Objects From Graph Theory: Graph is set of nodes and edges Tree has root and direction Data Objects: set of trees

Strongly Non-Euclidean Spaces Motivating Example: Blood Vessel Trees in Brains From Dr. Elizabeth BullittDr. Elizabeth Bullitt Segmented from MRIs Very complex structure Want to study population of trees Data Objects are trees

Strongly Non-Euclidean Spaces Real blood vessel trees (one person)

Strongly Non-Euclidean Spaces Real blood vessel trees (one person)

Strongly Non-Euclidean Spaces Real blood vessel trees (one person)

Strongly Non-Euclidean Spaces Real blood vessel trees (one person)

Strongly Non-Euclidean Spaces Real blood vessel trees (one person)

Strongly Non-Euclidean Spaces Statistics on Population of Tree-Structured Data Objects? Mean??? Analog of PCA??? Strongly non-Euclidean, since: Space of trees not a linear space Not even approximately linear (no tangent plane)

Strongly Non-Euclidean Spaces Mean of Population of Tree-Structured Data Objects? Natural approach: Frech é t mean Requires a metric (distance) On tree space

Strongly Non-Euclidean Spaces Appropriate metrics on tree space: Wang and Marron (2004) Depends on: –Tree structure –And nodal attributes Won ’ t go further here But gives appropriate Frech é t mean

Strongly Non-Euclidean Spaces PCA on Tree Space? Key Ideas: Replace 1-d subspace that best approximates data By 1-d representation that best approximates data Wang and Marron (2004) define notion of Treeline (in stucture space)

Strongly Non-Euclidean Spaces PCA on Tree Space? Also useful to consider 1-d representations In the space of nodal attributes. Simple Example: Blood vessel trees Just 4 nodes & simplified to sticks For computational tractability

Strongly Non-Euclidean Spaces 4 node Blood vessel trees - Raw Data

Strongly Non-Euclidean Spaces First PC: Note flipping of root Some images were upside down

Strongly Non-Euclidean Spaces First PC projection plot: Shows all data at one end or other, Nobody near the middle, Where tree was degenerate in movie

Strongly Non-Euclidean Spaces Proposed applications in M-rep world: Multifigural objects with some figures missing Multi-object images with some objects missing … Toy Example: hands with missing fingers

Return to Big Picture Main statistical goals of OODA: Understanding population structure –PCA, PGA, … Classification (i. e. Discrimination) –Understanding 2+ populations Time Series of Data Objects –Chemical Spectra, Mortality Data

Classification - Discrimination Background: Two Class (Binary) version: Using “ training data ” from Class +1 and Class -1 Develop a “ rule ” for assigning new data to a Class Canonical Example: Disease Diagnosis New Patients are “ Healthy ” or “ Ill ” Determined based on measurements

Classification - Discrimination Next time: go into Classification vs. Clustering Supervised vs. Un-Supervised Learning As now done on 10/25/05

Classification - Discrimination Terminology: For statisticians, these are synonyms For biologists, classification means: Constructing taxonomies And sorting organisms into them (maybe this is why discrimination was used, until politically incorrect … )

Classification (i.e. discrimination) There are a number of: Approaches Philosophies Schools of Thought Too often cast as: Statistics vs. EE - CS

Classification (i.e. discrimination) EE – CS variations: Pattern Recognition Artificial Intelligence Neural Networks Data Mining Machine Learning

Classification (i.e. discrimination) Differing Viewpoints: Statistics Model Classes with Probability Distribut ’ ns Use to study class diff ’ s & find rules EE – CS Data are just Sets of Numbers Rules distinguish between these Current thought: combine these

Classification (i.e. discrimination) Important Overview Reference: Duda, Hart and Stork (2001) Too much about neural nets??? Pizer disagrees … Update of Duda & Hart (1973)

Classification Basics Personal Viewpoint: Point Clouds

Classification Basics Simple and Natural Approach: Mean Difference a.k.a. Centroid Method Find “ skewer through two meatballs ”

Classification Basics For Simple Toy Example: Project On MD & split at center

Classification Basics Why not use PCA? Reasonable Result? Doesn ’ t use class labels … Good? Bad?

Classification Basics Harder Example (slanted clouds):

Classification Basics PCA for slanted clouds: PC1 terrible PC2 better? Still misses right dir ’ n Doesn ’ t use Class Labels

Classification Basics Mean Difference for slanted clouds: A little better? Still misses right dir ’ n Want to account for covariance

Classification Basics Mean Difference & Covariance, Simplest Approach: Rescale (standardize) coordinate axes i. e. replace (full) data matrix: Then do Mean Difference Called “ Na ï ve Bayes Approach ”

Classification Basics Problem with Na ï ve Bayes: Only adjusts Variances Not Covariances Doesn ’ t solve this problem

Classification Basics Better Solution: Fisher Linear Discrimination Gets the right dir ’ n How does it work?

Fisher Linear Discrimination Other common terminology (for FLD): Linear Discriminant Analysis (LDA)

Fisher Linear Discrimination Careful development: Useful notation (data vectors of length ): Class +1: Class -1: Centerpoints: and

Fisher Linear Discrimination Covariances, for (outer products) Based on centered, normalized data matrices: Note: use “ MLE ” version of estimated covariance matrices, for simpler notation

Fisher Linear Discrimination Major Assumption: Class covariances are the same (or “ similar ” ) Like this: Not this:

Fisher Linear Discrimination Good estimate of (common) within class cov? Pooled (weighted average) within class cov: based on the combined full data matrix:

Fisher Linear Discrimination Note: is similar to from before I.e. covariance matrix ignoring class labels Important Difference: Class by Class Centering Will be important later

Fisher Linear Discrimination Simple way to find “ correct cov. adjustment ” : Individually transform subpopulations so “ spherical ” about their means For define

Fisher Linear Discrimination Then: In Transformed Space, Best separating hyperplane is Perpendicular bisector of line between means

Fisher Linear Discrimination In Transformed Space, Separating Hyperplane has: Transformed Normal Vector: Transformed Intercept: Equation:

Fisher Linear Discrimination Thus discrimination rule is: Given a new data vector, Choose Class +1 when: i.e. (transforming back to original space) where:

Fisher Linear Discrimination So (in orig ’ l space) have separ ’ ting hyperplane with: Normal vector: Intercept:

Fisher Linear Discrimination Relationship to Mahalanobis distance Idea: For, a natural distance measure is: “ unit free ”, i.e. “ standardized ” essentially mod out covariance structure Euclidean dist. applied to & Same as key transformation for FLD I.e. FLD is mean difference in Mahalanobis space

Classical Discrimination Above derivation of FLD was: Nonstandard Not in any textbooks(?) Nonparametric (don ’ t need Gaussian data) I.e. Used no probability distributions More Machine Learning than Statistics

Classical Discrimination FLD Likelihood View Assume: Class distributions are multivariate for strong distributional assumption + common covariance

Classical Discrimination FLD Likelihood View (cont.) At a location, the likelihood ratio, for choosing between Class +1 and Class -1, is: where is the Gaussian density with covariance

Classical Discrimination FLD Likelihood View (cont.) Simplifying, using the the Gaussian density: Gives (critically using common covariances):

Classical Discrimination FLD Likelihood View (cont.) But: so: Thus when i.e.

Classical Discrimination FLD Likelihood View (cont.) Replacing, and by maximum likelihood estimates:, and Gives the likelihood ratio discrimination rule: Choose Class +1, when Same as above, so: FLD can be viewed as Likelihood Ratio Rule

Classical Discrimination FLD Generalization I Gaussian Likelihood Ratio Discrimination (a. k. a. “ nonlinear discriminant analysis ” ) Idea: Assume class distributions are Different covariances! Likelihood Ratio rule is straightf ’ d num ’ l calc. (thus can easily implement, and do discrim ’ n)

Classical Discrimination Gaussian Likelihood Ratio Discrim ’ n (cont.) No longer have separ ’ g hyperplane repr ’ n (instead regions determined by quadratics) (fairly complicated case-wise calculations) Graphical display: for each point, color as: Yellow if assigned to Class +1 Cyan if assigned to Class -1 (intensity is strength of assignment)

Classical Discrimination FLD for Tilted Point Clouds – Works well

Classical Discrimination GLR for Tilted Point Clouds – Works well

Classical Discrimination FLD for Donut – Poor, no plane can work

Classical Discrimination GLR for Donut – Works well (good quadratic)

Classical Discrimination FLD for X – Poor, no plane can work

Classical Discrimination GLR for X – Better, but not great