Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &

Slides:



Advertisements
Similar presentations
Partitional Algorithms to Detect Complex Clusters
Advertisements

Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Graph Embedding and Extensions: A General Framework for Dimensionality Reduction Keywords: Dimensionality reduction, manifold learning, subspace learning,
Yue Han and Lei Yu Binghamton University.
AGE ESTIMATION: A CLASSIFICATION PROBLEM HANDE ALEMDAR, BERNA ALTINEL, NEŞE ALYÜZ, SERHAN DANİŞ.
Patch to the Future: Unsupervised Visual Prediction
Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009
1 A Survey on Distance Metric Learning (Part 1) Gerry Tesauro IBM T.J.Watson Research Center.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Locally Constraint Support Vector Clustering
Principal Component Analysis
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Learning of Pseudo-Metrics. Slide 1 Online and Batch Learning of Pseudo-Metrics Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram.
CS 790Q Biometrics Face Recognition Using Dimensionality Reduction PCA and LDA M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
L16: Micro-array analysis Dimension reduction Unsupervised clustering.
OCFS: Optimal Orthogonal Centroid Feature Selection for Text Categorization Jun Yan, Ning Liu, Benyu Zhang, Shuicheng Yan, Zheng Chen, and Weiguo Fan et.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Distance Metric Learning: A Comprehensive Survey
Diffusion Maps and Spectral Clustering
Dimensionality reduction Usman Roshan CS 675. Supervised dim reduction: Linear discriminant analysis Fisher linear discriminant: –Maximize ratio of difference.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Enhancing Tensor Subspace Learning by Element Rearrangement
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
ENN: Extended Nearest Neighbor Method for Pattern Recognition
1 Graph Embedding (GE) & Marginal Fisher Analysis (MFA) 吳沛勳 劉冠成 韓仁智
Graph Embedding: A General Framework for Dimensionality Reduction Dong XU School of Computer Engineering Nanyang Technological University
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.
Learning the threshold in Hierarchical Agglomerative Clustering
Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,
Transductive Regression Piloted by Inter-Manifold Relations.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Low-Rank Kernel Learning with Bregman Matrix Divergences Brian Kulis, Matyas A. Sustik and Inderjit S. Dhillon Journal of Machine Learning Research 10.
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
CSE 185 Introduction to Computer Vision Face Recognition.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
1 Heat Diffusion Classifier on a Graph Haixuan Yang, Irwin King, Michael R. Lyu The Chinese University of Hong Kong Group Meeting 2006.
Spectral Sequencing Based on Graph Distance Rong Liu, Hao Zhang, Oliver van Kaick {lrong, haoz, cs.sfu.ca {lrong, haoz, cs.sfu.ca.
Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University.
Feature extraction using fuzzy complete linear discriminant analysis The reporter : Cui Yan
1/18 New Feature Presentation of Transition Probability Matrix for Image Tampering Detection Luyi Chen 1 Shilin Wang 2 Shenghong Li 1 Jianhua Li 1 1 Department.
Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Dimensionality reduction
June 25-29, 2006ICML2006, Pittsburgh, USA Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Masashi Sugiyama Tokyo Institute of.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Self-taught Clustering – an instance of Transfer Unsupervised Learning † Wenyuan Dai joint work with ‡ Qiang Yang, † Gui-Rong Xue, and † Yong Yu † Shanghai.
2D-LDA: A statistical linear discriminant analysis for image matrix
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Ultra-high dimensional feature selection Yun Li
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
On the relevance of facial expressions for biometric recognition Marcos Faundez-Zanuy, Joan Fabregas Escola Universitària Politècnica de Mataró (Barcelona.
Principal Components Analysis ( PCA)
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Spectral Methods for Dimensionality
Correlative Multi-Label Multi-Instance Image Annotation
Dimensionality reduction
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Learning with information of features
Nearest-Neighbor Classifiers
Feature space tansformation methods
Generally Discriminant Analysis
Digital Image Processing
Presentation transcript:

Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science & Engineering FUDAN University, Shanghai, China

2 Outline Motivation  Related Work  Main Idea Proposed Algorithm  Discriminant Neighborhood Embedding  Dimensionality Selection Criterion Experimental Results  Toy Datasets  Real-world Datasets Conclusions

3 Related Work Many recent techniques have been proposed to learn a more appropriate metric space for better performance of many learning and data mining algorithms, for examples,  Relevant Component Analysis, Bar-Hillel, A., et al. ICML2003.  Locality Preserving Projections, He, X. et al., NIPS  Neighborhood Components Analysis, Goldberger, J., et al. NIPS  Marginal Fisher Analysis, Yan, S., et al., CVPR  Local Discriminant Embedding, Chen, H.-T., et al. CVPR  Local Fisher Discriminant Analysis, Sugiyama, M. ICML 2006  …… However, the target dimensionality of the new space is selected empirically in the above mentioned approaches

4 Main Idea Given finite labeled multi-class samples, what can we do for better performance of kNN classification?  Can we learn a low dimensional embedding for that kNN points in the same class have smaller distances to each other than to points in different classes?  Can we estimate the optimal dimensionality of the new metric space in the meantime ? Original Space (D=2) New Space (d=1)

5 Outline Motivation  Related Work  Main Idea Proposed Algorithm  Discriminant Neighborhood Embedding  Dimensionality Selection Criterion Experimental Results  Toy Datasets  Real-world Datasets Conclusions

6 Setup N labeled multi-class points: k nearest neighbors of in the same class: k nearest neighbors of in the other classes: Discriminant adjacent matrix F :

7 Objective Function Intra-class compactness in the new space : Inter-class separability in the new space : (S is a diagonal matrix whose entries are column sums of F)

8 How to Compute P Note  The matrix X(S-F)X T is symmetric, but not positive definite. It might have negative, zero, or positive eigenvalues  The optimal transformation P can be obtained by the eigenvectors of X(S-F)X T corresponding to its all d negative eigenvalues P arg

9 What does the Positive/Negative Eigenvalue Mean? The i th eigenvector P i corresponding to the i th eigenvalue  : the total kNN pairwise distance in the same class  : the total kNN pairwise distance in different class

10 Choosing the Leading Negative Eigenvalues Among all the negative eigenvalues, some might have much larger absolute values, but the others with small absolute values could be ignored We can then choose t (t<d) negative eigenvalues with the largest absolute values such that

11 Learned Mahalanobis Distance In the original space, the distance between any pair of points can be obtained by

12 Outline Motivation  Related Work  Main Idea Proposed Algorithm  Discriminant Neighborhood Embedding  Dimensionality Selection Criterion Experimental Results  Toy Datasets  Real-world Datasets Conclusions

13 Three Classes of Well Clustered Data Both eigenvalues are negative and comparable Need not perform dimensionality reduction

14 Two Classes of Data with Multimodal Distribution A big difference between two negative eigenvalues The leading eigenvector P 1 corresponding to will be kept.

15 Three Classes of Data Two eigenvectors corresponding to positive and negative eigenvalues, respectively. The eigenvector with positive eigenvalue should be discarded from the point of view of kNN classification.

16 Five Classes of Non-separable Data Both eigenvalues are positive, and it means that we could not perform kNN classification well both in the original and new spaces

17 UCI Sonar Dataset When eigenvalues < 0, the more dimensionality, the higher accuracy When eigenvalues near 0, its optimum can be achieved When eigenvalues > 0, the performance decreases Cumulative eigenvalue curve

18 Comparisons with the State-of-the-Art

19 UMIST Face Database

20 Comparisons with the State-of-the-Art UMIST Face Database

21 Outline Motivation  Related Work  Main Idea The Proposed Algorithm  Discriminant Neighborhood Embedding  Dimensionality Selection Criterion Experimental Results  Toy Datasets  Real-world Datasets Conclusions

22 Conclusions Summary  A low dimensional embedding can be LEARNED for better accuracy in kNN classification given finite training samples  Optimal dimensionality can be estimated Future work  For large scale datasets, how to reduce the computational complexity?

Thanks for your Attention! Any questions?