IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Slides:



Advertisements
Similar presentations
Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics
Advertisements

1 Manifold Alignment for Multitemporal Hyperspectral Image Classification H. Lexie Yang 1, Melba M. Crawford 2 School of Civil Engineering, Purdue University.
Component Analysis (Review)
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Nonlinear Dimension Reduction Presenter: Xingwei Yang The powerpoint is organized from: 1.Ronald R. Coifman et al. (Yale University) 2. Jieping Ye, (Arizona.
Graph Embedding and Extensions: A General Framework for Dimensionality Reduction Keywords: Dimensionality reduction, manifold learning, subspace learning,
Machine learning continued Image source:
Dimension reduction (1)
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
One-Shot Multi-Set Non-rigid Feature-Spatial Matching
Pattern Recognition and Machine Learning
Principal Component Analysis
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Dimensional reduction, PCA
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Linear Discriminant Functions Chapter 5 (Duda et al.)
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
Diffusion Maps and Spectral Clustering
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Representative Previous Work
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Summarized by Soo-Jin Kim
Enhancing Tensor Subspace Learning by Element Rearrangement
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
1 Graph Embedding (GE) & Marginal Fisher Analysis (MFA) 吳沛勳 劉冠成 韓仁智
General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.
Graph Embedding: A General Framework for Dimensionality Reduction Dong XU School of Computer Engineering Nanyang Technological University
Learning a Kernel Matrix for Nonlinear Dimensionality Reduction By K. Weinberger, F. Sha, and L. Saul Presented by Michael Barnathan.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Presented by Xianwang Wang Masashi Sugiyama.
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
Nonlinear Dimensionality Reduction Approach (ISOMAP)
Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.
H. Lexie Yang1, Dr. Melba M. Crawford2
Spectral Sequencing Based on Graph Distance Rong Liu, Hao Zhang, Oliver van Kaick {lrong, haoz, cs.sfu.ca {lrong, haoz, cs.sfu.ca.
Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &
A Convergent Solution to Tensor Subspace Learning.
Supervisor: Nakhmani Arie Semester: Winter 2007 Target Recognition Harmatz Isca.
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
June 25-29, 2006ICML2006, Pittsburgh, USA Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Masashi Sugiyama Tokyo Institute of.
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
2D-LDA: A statistical linear discriminant analysis for image matrix
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Nonlinear Dimension Reduction: Semi-Definite Embedding vs. Local Linear Embedding Li Zhang and Lin Liao.
Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
Spectral Methods for Dimensionality
Shuang Hong Yang College of Computing, Georgia Tech, USA Hongyuan Zha
LECTURE 10: DISCRIMINANT ANALYSIS
Recognition with Expression Variations
Unsupervised Riemannian Clustering of Probability Density Functions
کاربرد نگاشت با حفظ تنکی در شناسایی چهره
PCA vs ICA vs LDA.
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Learning with information of features
CS 2750: Machine Learning Support Vector Machines
Principal Component Analysis
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Feature space tansformation methods
Generally Discriminant Analysis
Globally Maximizing Locally Minimizing unsupervised discriminant projection with applications to face and palm biometrics PAMI 2007 Bo Yang 2/25/2019.
LECTURE 09: DISCRIMINANT ANALYSIS
Nonlinear Dimension Reduction:
Using Manifold Structure for Partially Labeled Classification
Presentation transcript:

IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Graph Embedding and Extensions: A General Framework for Dimensionality Reduction IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Shuicheng Yan, Dong Xu, Benyu Zhang, Hong-Jiang Zhang, Qiang Yang, Stephen Lin Presented by meconin

Outline Introduction Graph Embedding (GE) Marginal Fisher Analysis (MFA) Experiments Conclusion and Future Work

Introduction Dimensionality Reduction Linear PCA, LDA, are the two most popular due to simplicity and effectiveness LPP, preserves local relationships in the data set, and uncovers its essential manifold structure

Introduction Dimensionality Reduction For nonlinear methods, ISOMAP, LLE, Laplacian Eigenmap are three algorithms have been developed recently Kernel trick: linear methods → nonlinear ones performing linear operations on higher or even infinite dimensional by kernel mapping function

Introduction Dimensionality Reduction Tensor based algorithms 2DPCA, 2DLDA, DATER

Introduction Graph Embedding is a general framework for dimensionality reduction With it’s linearization, kernelization, and tensorization, we have a unified view for understanding DR algorithms The above-mentioned algorithms can all be reformulated with in it

Introduction This paper show that GE can be used as a platform for developing new DR algorithms Marginal Fisher Analysis (MFA) Overcome the limitations of LDA

Introduction LDA (Linear Discriminant Analysis) Find the linear combination of features best separate classes of objects Number of available projection directions is lower than class number Based upon interclass and intraclass scatters, optimal only when the data of each class is approximately Gaussian distributed

Introduction MFA advantage: (compare with LDA) The number of available projection directions is much larger No assumption on the data distribution, more general for discriminant analysis The interclass margin can better characterize the separability of different classes

Graph Embedding For classification problem, the sample set is represented as a matrix X = [x1, x2, …, xN], xi  Rm In practice, the feature dimension m is often very high, thus it’s necessary to transform the data to a low-dimensional one yi = F(xi), for all i

Graph Embedding

Graph Embedding Different motivations of DR algorithms, their objectives are similar – to derive lower dimensional representation Can we reformulate them within a unifying framework? Whether the framework assists design new algorithms?

Graph Embedding Give a possible answer Represent each vertex of a graph as a low-dimensional vector that preserves similarities between the vertex pairs The similarity matrix of the graph characterizes certain statistical or geometric properties of the data set

Graph Embedding G = { X, W } be an undirected weighted graph with vertex set X and similarity matrix W  RNN The diagonal matrix D and the Laplacian matrix L of a graph G are defined as L = D  W, Dii = ,  i

Graph Embedding Graph embedding of G is an algorithm to find low-dimensional vector representations relationships among the vertices of G B is the constraint matrix, and d is a constant, for avoid trivial solution

Graph Embedding For larger similarity between samples xi and xj, the distance between yi and yj should be smaller to minimize the objective function To offer mappings for data points throughout the entire feature space Linearization, Kernelization, Tensorization

Graph Embedding Linearization Assuming y = XTw Kernelization : x  F, assuming

Graph Embedding The solutions are obtained by solving the generalized eigenvalue decomposition problem F. Chung, “Spectral Graph Theory,” Regional Conf. Series in Math.,no. 92, 1997

Graph Embedding Tensor the extracted feature from an object may contain higher-order structure Ex: an image is a second-order tensor sequential data such as video sequences is a third-order tensor

Graph Embedding Tensor In n dimensional space, nr directions, r is the rank(order) of a tensor For tensor A, B  Rm1m2…mn the inner product

Graph Embedding Tensor For a matrix U  Rmkm’k, B = A k U

Graph Embedding The objective funtion: In many case, there is no closed-form solution, but we can obtain the local optimum by fixing the projection vector

General Framework for DR The differences of DR algorithms: the computation of the similarity matrix of the graph the selection of the constraint matrix

General Framework for DR

General Framework for DR PCA seeks projection directions with maximal variances it finds and removes the projection direction with minimal variance

General Framework for DR KPCA applies the kernel trick on PCA, hence it is a kernelization of graph embedding 2DPCA is a simplified second-order tensorization of PCA and only optimizes one projection direction

General Framework for DR LDA searches for the directions that are most effective for discrimination by minimizing the ratio between the intraclass and interclass scatters

General Framework for DR LDA

General Framework for DR LDA follows the linearization of graph embedding the intrinsic graph connects all the pairs with same class labels the weights are in inverse proportion to the sample size of the corresponding class

General Framework for DR The intrinsic graph of PCA is used as the penalty graph of LDA PCA LDA

General Framework for DR KDA is the kernel extension of LDA 2DLDA is the second-order tensorization of LDA DATER is the tensorization of LDA in arbitrary order

General Framework for DR LLP ISOMAP LLE Laplacian Eigenmap (LE)

Related Works Kernel Interpretation Ham et al. KPCA, ISOMAP, LLE, LE share a common KPCA formulation with different kernel definitions Kernel matrix v.s Laplacian matrix from similarity matrix Only unsupervised v.s more general

Related Works Out-of-Sample Extension Brand Mentioned the concept of graph embedding Brand’s work can be considered as a special case of our graph embedding

Related Works Laplacian Eigenmap Work with only a single graph, i.e., the intrinsic graph, and cannot be used to explain algorithms such as ISOMAP, LLE, and LDA Some works use a Gaussian function to compute the nonnegative similarity matrix

Marginal Fisher Analysis

Marginal Fisher Analysis Intraclass compactness (intrinsic graph)

Marginal Fisher Analysis Interclass separability (penalty graph)

The first step of MFA

The second step of MFA

Marginal Fisher Analysis Intraclass compactness (intrinsic graph)

Marginal Fisher Analysis Interclass separability (penalty graph)

The third step of MFA

First of Four steps of MFA

LDA v.s MFA The available projection directions are much greater than that of LDA There is no assumption on the data distribution of each class The interclass margin in MFA can better characterize the separability of different classes than the interclass variance in LDA

Kernel MFA The distance between two samples For a new data point x, its projection to the derived optimal direction

Tensor MFA

Experiments Face Recognition XM2VTS, CMU PIE, ORL A Non-Gaussian Case

Experiments XM2VTS, PIE-1, PIE-2, ORL

Experiments

Experiments

Experiments

Experiments

Experiments