Using Manifold Structure for Partially Labeled Classification

Slides:



Advertisements
Similar presentations
Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics
Advertisements

Partitional Algorithms to Detect Complex Clusters
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Olivier Duchenne , Armand Joulin , Jean Ponce Willow Lab , ICCV2011.
A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1.
Input Space versus Feature Space in Kernel- Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of.
SVM—Support Vector Machines
A novel supervised feature extraction and classification framework for land cover recognition of the off-land scenario Yan Cui
Nonlinear Unsupervised Feature Learning How Local Similarities Lead to Global Coding Amirreza Shaban.
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #20.
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Graph Based Semi- Supervised Learning Fei Wang Department of Statistical Science Cornell University.
Principal Component Analysis
Basic Concepts and Definitions Vector and Function Space. A finite or an infinite dimensional linear vector/function space described with set of non-unique.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
Ordinary least squares regression (OLS)
Diffusion Maps and Spectral Clustering
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Summarized by Soo-Jin Kim
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Total Variation and Euler's Elastica for Supervised Learning
1 Recognition by Appearance Appearance-based recognition is a competing paradigm to features and alignment. No features are extracted! Images are represented.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Yang, Luyu.  Postal service for sorting mails by the postal code written on the envelop  Bank system for processing checks by reading the amount of.
Transductive Regression Piloted by Inter-Manifold Relations.
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
SVD Data Compression: Application to 3D MHD Magnetic Field Data Diego del-Castillo-Negrete Steve Hirshman Ed d’Azevedo ORNL ORNL-PPPL LDRD Meeting ORNL.
Nonlinear Learning Using Local Coordinate Coding K. Yu, T. Zhang and Y. Gong, NIPS 2009 Improved Local Coordinate Coding Using Local Tangents K. Yu and.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
CSE 185 Introduction to Computer Vision Face Recognition.
Manifold learning: MDS and Isomap
Project by: Cirill Aizenberg, Dima Altshuler Supervisor: Erez Berkovich.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Unsupervised Feature Selection for Multi-Cluster Data Deng Cai, Chiyuan Zhang, Xiaofei He Zhejiang University.
Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
June 25-29, 2006ICML2006, Pittsburgh, USA Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Masashi Sugiyama Tokyo Institute of.
CS654: Digital Image Analysis Lecture 11: Image Transforms.
Chapter 13 Discrete Image Transforms
A Binary Linear Programming Formulation of the Graph Edit Distance Presented by Shihao Ji Duke University Machine Learning Group July 17, 2006 Authors:
Pattern recognition – basic concepts. Sample input attribute, attribute, feature, input variable, independent variable (atribut, rys, příznak, vstupní.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Principal Component Analysis (PCA)
Data Transformation: Normalization
Chapter 7. Classification and Prediction
Intrinsic Data Geometry from a Training Set
The Elements of Statistical Learning
Recognition: Face Recognition
Importance Weighted Active Learning
Machine Learning Dimensionality Reduction
Pawan Lingras and Cory Butz
In summary C1={skin} C2={~skin} Given x=[R,G,B], is it skin or ~skin?
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Singular Value Decomposition
Object Modeling with Layers
Learning with information of features
Outline Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no.
Principal Component Analysis
Outline H. Murase, and S. K. Nayar, “Visual learning and recognition of 3-D objects from appearance,” International Journal of Computer Vision, vol. 14,
Feature space tansformation methods
CS4670: Intro to Computer Vision
Concave Minimization for Support Vector Machine Classifiers
Xiao-Yu Zhang, Shupeng Wang, Xiaochun Yun
CAMCOS Report Day December 9th, 2015 San Jose State University
Lecture 16. Classification (II): Practical Considerations
Presentation transcript:

Using Manifold Structure for Partially Labeled Classification by Belkin and Niyogi, NIPS 2002 Presented by Chunping Wang Machine Learning Group, Duke University November 16, 2007

Outline Motivations Algorithm Description Theoretical Interpretation Experimental Results Comments

Motivations (1) Why manifold structure is useful? Data lies on a lower-dimensional manifold – a dimension reduction is preferable an example: a handwritten digit 0 Usually, dimensionality is the number of pixels, typically very high (256)

Motivations (1) Why manifold structure is useful? Data lies on a lower-dimensional manifold – a dimension reduction is preferable an example: a handwritten digit 0 Usually, dimensionality is the number of pixels, typically very high (256) d1 Ideally, 5-dimensional features f1 * d2 f2 *

Motivations (1) Why manifold structure is useful? Data lies on a lower-dimensional manifold – a dimension reduction is preferable an example: a handwritten digit 0 Actually, a higher dimensionality, but perhaps no more than several dozens Usually, dimensionality is the number of pixels, typically far higher (256) d1 Ideally, 5-dimensional features f1 * d2 f2 *

Motivations (2) Why manifold structure is useful? Data representation in the original space is unsatisfactory labeled unlabeled In the original space 2-d representation with Laplacian Eigenmaps

Algorithm Description (1) Semi-supervised classification k points First s are labeled (s<k) for binary cases Constructing the Adjacency Graph if i is among n nearest neighbors of j or j is among n nearest neighbors of i Eigenfunctions compute , corresponding to the p smallest eigenvelues for the graph Laplacian L = D-W,

Algorithm Description (2) Semi-supervised classification k points First s are labeled (s<k) for binary cases Building the classifier minimize the error function over the space of coefficients a the solution is Classifying unlabeled points (i >s)

Theoretical Interpretation (1) For a manifold , the eigenfunctions of its Laplacian form a basis for the Hilbert space , i.e., any function can be written as with eigenfunctions satisfying The simplest nontrivial example: the manifold is a unit circle S1 Fourier series

Theoretical Interpretation (2) Smoothness measure S: a small S means “smooth” For unit circle S1 Generally Smaller eigenvalues correspond to smoother eigenfunctions (lower frequency) is a constant function In terms of the smoothest p eigenfunctions, the approximation of an arbitrary function

Theoretical Interpretation (3) Back to our problem with finite number of points The solution of a discrete version For binary classification, the alphabet of the function f only contains two possible values. For M-ary cases, the only difference is the number of possible values is more than two.

Results (1) Handwritten Digit Recognition (MNIST data set) 60,000 28-by-28 gray images (the first 100 principal components are used) p=20% k

Results (2) Text Classification (20 Newsgroups data set) 19,935 vectors with dimensionality of 6000 p=20% k

Comments This semi-supervised algorithm essentially converts the original problem to a linear regression problem in a new space with lower dimensionality. The approach to solve this linear regression problem is the standard least square estimation. Only n nearest neighbors are considered for each data point, thus the computation for eigen-decomposition is reduced. Little additional computation is expended after dimensionality reduction. More comments ……