Exploring Intrinsic Structures from Samples: Supervised, Unsupervised, and Semisupervised Frameworks Supervised by Prof. Xiaoou Tang & Prof. Jianzhuang.

Slides:

Advertisements

Similar presentations

Active Appearance Models

Advertisements

Component Analysis (Review)

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

A Geometric Perspective on Machine Learning 何晓飞浙江大学计算机学院 1.

Graph Embedding and Extensions: A General Framework for Dimensionality Reduction Keywords: Dimensionality reduction, manifold learning, subspace learning,

Dimension reduction (1)

AGE ESTIMATION: A CLASSIFICATION PROBLEM HANDE ALEMDAR, BERNA ALTINEL, NEŞE ALYÜZ, SERHAN DANİŞ.

1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.

Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

One-Shot Multi-Set Non-rigid Feature-Spatial Matching

Principal Component Analysis

An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.

Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.

Image Matching via Saliency Region Correspondences Alexander Toshev Jianbo Shi Kostas Daniilidis IEEE Conference on Computer Vision and Pattern Recognition.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Dimensionality reduction Usman Roshan CS 675. Supervised dim reduction: Linear discriminant analysis Fisher linear discriminant: –Maximize ratio of difference.

Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.

Representative Previous Work

Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University

Summarized by Soo-Jin Kim

Enhancing Tensor Subspace Learning by Element Rearrangement

Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.

Presented By Wanchen Lu 2/25/2013

1 Graph Embedding (GE) & Marginal Fisher Analysis (MFA) 吳沛勳劉冠成韓仁智

Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.

Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.

General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.

Graph Embedding: A General Framework for Dimensionality Reduction Dong XU School of Computer Engineering Nanyang Technological University

IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,

Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)

A Two-level Pose Estimation Framework Using Majority Voting of Gabor Wavelets and Bunch Graph Analysis J. Wu, J. M. Pedersen, D. Putthividhya, D. Norgaard,

SINGULAR VALUE DECOMPOSITION (SVD)

Transductive Regression Piloted by Inter-Manifold Relations.

GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

Optimal Component Analysis Optimal Linear Representations of Images for Object Recognition X. Liu, A. Srivastava, and Kyle Gallivan, “Optimal linear representations.

Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &

A Convergent Solution to Tensor Subspace Learning.

MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

STATIC ANALYSIS OF UNCERTAIN STRUCTURES USING INTERVAL EIGENVALUE DECOMPOSITION Mehdi Modares Tufts University Robert L. Mullen Case Western Reserve University.

Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.

2D-LDA: A statistical linear discriminant analysis for image matrix

Ultra-high dimensional feature selection Yun Li

Unsupervised Learning II Feature Extraction

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:

Spectral Methods for Dimensionality

Principal Component Analysis (PCA)

PREDICT 422: Practical Machine Learning

Semi-Supervised Clustering

Intrinsic Data Geometry from a Training Set

LECTURE 10: DISCRIMINANT ANALYSIS

Unsupervised Riemannian Clustering of Probability Density Functions

کاربرد نگاشت با حفظ تنکی در شناسایی چهره

Machine Learning Basics

Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”

Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE

Learning with information of features

Lecture 21 SVD and Latent Semantic Indexing and Dimensional Reduction

Presented by: Chang Jia As for: Pattern Recognition

Feature space tansformation methods

Generally Discriminant Analysis

LECTURE 09: DISCRIMINANT ANALYSIS

Using Manifold Structure for Partially Labeled Classification

Presentation transcript:

Exploring Intrinsic Structures from Samples: Supervised, Unsupervised, and Semisupervised Frameworks Supervised by Prof. Xiaoou Tang & Prof. Jianzhuang Liu

Outline Trace Ratio Optimization Tensor Subspace Learning Correspondence Propagation Preserve sample feature structures Explore the geometric structures and feature domain relations concurrently Notations & introductions Dimensionality reduction

Outline Trace Ratio Optimization A convergent solution to Tensor Subspace Learning Semisupervised Regression on Multiclass Data Correspondence Propagation Preserve sample feature structures during the dimensionality reduction process Explore the geometric structures and feature domain relations concurrently Explore the feature structures w.r.t. different classes

Concept Concept. Tensor Tensor: multi-dimensional (or multi-way) arrays of components

Application Concept. Tensor real-world data are affected by multifarious factors for the person identification, we may have facial images of different ► views and poses ► lightening conditions ► expressions the observed data evolve differently along the variation of different factors ► image columns and rows

Application Concept. Tensor it is desirable to dig through the intrinsic connections among different affection factors of the data. Tensor provides a concise and effective representation. Illumination pose expression Image columns Image rows Images

Introduction Concept. Dimensionality Reduction Preserve sample feature structures Enhance classification capability Reduce the computational complexity

Preserve sample feature structures during the dimensionality reduction process

Trace Ratio Optimization. Definition w.r.t. Positive semidefinite Homogeneous property: Special case, when W is a vector Generalized Rayleigh Quotient GEVD Orthoganality constraint Optimization over the Grassman manifold

Trace Ratio Formulation Linear Discriminant Analysis

Trace Ratio Formulation Kernel Discriminant Analysis w.r.t. Decompose w.r.t. Let w.r.t.

Trace Ratio Formulation Marginal Fisher Analysis Intra-class graph (Intrinsic graph) Inter-class graph (Penalty graph)

Trace Ratio Formulation Kernel Marginal Fisher Analysis w.r.t. Decompose w.r.t. Let w.r.t.

Concept Trace Ratio Formulation 2-D Linear Discriminant Analysis Left Projection & Right Projection Fix one projection matrix & optimize the other Discriminant Analysis with Tensor Representation

Trace Ratio Formulation Tensor Subspace Analysis

Trace Ratio Formulation Conventional Solution: GEVD Singularity problem of Nullspace LDA Dualspace LDA

from Trace Ratio to Trace Difference Preprocessing Remove the Null Space of with Principal Component Analysis.

from Trace Ratio to Trace Difference What will we do? from Trace Ratio to Trace Difference Objective: Define Then Trace Ratio Trace Difference Find So that

from Trace Ratio to Trace Difference What will we do? from Trace Ratio to Trace Difference Constraint Let We have Thus The Objective rises monotonously! Whereare the leading eigen vectors of.

Strict Monotony and Convergency If Then Thus orthogonal matrix Meanwhile, both and have been calculated from the leading eigenvectors of the projected samples. span the same space So The algorithm is strictly monotonous! The algorithm is guaranteed to converge!

Main Algorithm Process Main Algorithm 1: Initialization. Initialize as arbitrary column orthogonal matrices. 2: Iterative optimization. For t=1, 2,..., Tmax, Do 1. Set. 2. Conduct Eigenvalue Decomposition: 3. Reshape the projection directions 4. 3: Output the projection matrices

Hightlights of the Trace Ratio based algorithm Highlights of our algorithm A new solution to the Discriminant Analysis algorithms. The derived projection matrix is orthogonal. Enhanced potential classification capability of the derived low- dimensional representation from the subspace learning algorithms.

A Convergent Solution to Tensor Subspace Learning

Tensor Subspace Learning algorithms Traditional Tensor Discriminant algorithms Tensor Subspace Analysis He et.al Two-dimensional Linear Discriminant Analysis Discriminant Analysis with Tensor Representation Ye et.al Yan et.al project the tensor along different dimensions or ways projection matrices for different dimensions are derived iteratively solve an trace ratio optimization problem DO NOT CONVERGE !

Tensor Subspace Learning algorithms Graph Embedding – a general framework An undirected intrinsic graph G={X,W} is constructed to represent the pairwise similarities over sample data. A penalty graph or a scale normalization item is constructed to impose extra constraints on the transform.

Discriminant Analysis Objective Solve the projection matrices iteratively: leave one projection matrix as variable while keeping others as constant. No closed form solution Mode-k unfolding of the tensor

Objective Deduction Discriminant Analysis Objective Trace Ratio: General Formulation for the objectives of the Discriminant Analysis based Algorithms. DATER: TSA: Within Class Scatter of the unfolded data Between Class Scatter of the unfolded data Diagonal Matrix with weights Constructed from Image Manifold

Disagreement between the Objective and the Optimization Process Why do previous algorithms not converge? GEVD The conversion from Trace Ratio to Ratio Trace induces an inconsistency among the objectives of different dimensions!

from Trace Ratio to Trace Difference What will we do? from Trace Ratio to Trace Difference Objective: Define Then Trace Ratio Trace Difference Find So that

from Trace Ratio to Trace Difference What will we do? from Trace Ratio to Trace Difference Constraint Let We have Thus The Objective rises monotonously! Projection matrices of different dimensions share the same objective Whereare the leading eigen vectors of.

Main Algorithm Process Main Algorithm 1: Initialization. Initialize as arbitrary column orthogonal matrices. 2: Iterative optimization. For t=1, 2,..., Tmax, Do For k=1, 2,..., n, Do 1. Set. 2. Compute and. 3. Conduct Eigenvalue Decomposition: 4. Reshape the projection directions 5. 3: Output the projection matrices

Strict Monotony and Convergency If Then Thus orthogonal matrix Meanwhile, both and have been calculated from the leading eigenvectors of the projected samples. span the same space So The algorithm is strictly monotonous! Theorem[Meyer,1976]: Assume that the algorithm Ω is strictly monotonic with respect to J and it generates a sequence which lies in a compact set. If is normed, then. The algorithm is guaranteed to converge!

Hightlights of the Trace Ratio based algorithm Highlights of our algorithm The objective value is guaranteed to monotonously increase; and the multiple projection matrices are proved to converge. Only eigenvalue decomposition method is applied for iterative optimization, which makes the algorithm extremely efficient. Enhanced potential classification capability of the derived low- dimensional representation from the subspace learning algorithms. The first work to give a convergent solution to the general tensor-based subspace learning.

Monotony of the Objective Experimental Results error vs. iteration number, and the trace difference (implies the error between and the optimum) vs. iteration number. (a-b) FERET database, and (c-d) CMU PIE database.

Projection Visualization Experimental Results Visualization of the projection matrix W of PCA, ratio trace based LDA, and trace ratio based LDA (ITR) on the FERET database.

Face Recognition Results.Linear Experimental Results Comparison: Trace Ratio Based LDA vs. the Ratio Trace based LDA (PCA+LDA) Comparison: Trace Ratio Based MFA vs. the Ratio Trace based MFA (PCA+MFA)

Face Recognition Results.Kernelization Experimental Results Trace Ratio Based KDA vs. the Ratio Trace based KDA Trace Ratio Based KMFA vs. the Ratio Trace based KMFA

Recognition error rates over dimensions Experimental Results Recognition error rates over different dimensions. The configuration is N3T7 on the CMU PIE database. For LDA and MFA, the dimension of the preprocessing PCA step is N-Nc.

Results on UCI Dataset Experimental Results Testing classification errors on three UCI databases for both linear and kernel- based algorithms. Results are obtained from 100 realizations of randomly generated 70/30 splits of data.

Monotony of the Objective & Projection Matrix Convergence Experimental Results

Face Recognition Results Experimental Results 1. TMFA TR mostly outperforms all the other methods concerned in this work, with only one exception for the case G5P5 on the CMU PIE database. 2. For vector-based algorithms, the trace ratio based formulation is consistently superior to the ratio trace based one for subspace learning. 3. Tensor representation has the potential to improve the classification performance for both trace ratio and ratio trace formulations of subspace learning.

Summary A novel iterative procedure was proposed to directly optimize the objective function of general subspace learning based on tensor representation. The convergence of the projection matrices and the monotony property of the objective function value were proven. The first work to give a convergent solution for the general tensor-based subspace learning.

Correspondence Propagation Geometric Structures & Feature Structures Explore the geometric structures and feature domain consistency for object registration

Objective Aim Exploit the geometric structures of sample features Introduce human interaction for correspondence guidance Seek a mapping of features from sets of different cardinalities Objects are represented as sets of feature points

Objective Motivation Pursue a closed-form solution Introduce human interaction for correspondence guidance Express feature similarity as a bipartite similarity graph Code the geometric feature distribution with spatial graphs

Graph Construction Spatial GraphSimilarity Graph

From Spatial Graph to Categorical Product Graph iff and Definition: Suppose and are the vertices of graph and respectively. Two assignments and are neighbors iff both pairsand are neighbors in and respectively, namely, where means and are neighbors. Assignment Neighborhood Definition

From Spatial Graph to Categorical Product Graph The adjacency matrix of can be derived from: where is the matrix Kronecker product operator. Smoothness along the spatial distribution:

Feature Domain Consistency & Soft Constraints Similarity Measure: One-to-one correspondence penalty whereis matrix Hardamard product and returns the sum of all elements in T where and or

Assignment Labeling Assign zeros to those pairs with extremely low similarity scores. Labeled assignments: Reliable correspondence & Inhomogeneous Pairs Inhomogeneous Pair Labeling Reliable Pair Labeling Assign ones to those reliable pairs

arrangement Reliable Correspondence Propagation Assignment variables Arrangement: Coefficient matrices Spatial Adjacency matrices

Objective Reliable Correspondence Propagation Objective: Feature domain agreement: Geometric smoothness regularization: One-to-one correspondence penalty:

Solution Reliable Correspondence Propagation where Relax to real domain & Closed-form Solution: and

Rearrangement & Discretizing Rearrangement and Discretization Inverse process of the element arrangement: Reshape the assignment vector into matrix: Thresholding: Assignments larger than a threshold are regarded as correspondences. Eliciting: Sequentially pick up the assignments with largest assignment scores.

Semisupervised & Automatic Systems Semi-supervised & Unsupervised Frameworks Exact pairwise correspondence labeling: Users give exact correspondence guidance Obscure correspondence guidance: Rough correspondence of image parts

Experimental Results. Demonstration

Experiment. Dataset

Experimental Results. Details Automatic feature matching score on the Oxford real image transformation dataset. The transformations include viewpoint change ((a) Graffiti and (b) Wall sequence), image blur ((c) bikes and (d) trees sequence), zoom and rotation ((e) bark and (f) boat sequence), illumination variation ((g) leuven ) and JPEG compression ((h) UBC).

Summary An efficient feature matching framework that transduces certain number of reliable correspondences to the remaining ones. Easy to switch from semi-supervised framework to an automatic system if combined with some simple approaches. Naturally extended to incorporate human interactions Both geometric smoothness and feature agreements are considered.

Summary Future Works From point-to-point correspondence to set-to-set correspondence. Multi-scale correspondence searching.

Summary Future Works From point-to-point correspondence to set-to-set correspondence. Multi-scale correspondence searching. Combine the object segmentation and registration.

Publications Publications: [1] Huan Wang, Shuicheng Yan, Thomas Huang and Xiaoou Tang, ‘A convergent solution to Tensor Subspace Learning’, International Joint Conferences on Artificial Intelligence (IJCAI 07 Regular paper), Jan [2] Huan Wang, Shuicheng Yan, Thomas Huang and Xiaoou Tang, ‘Trace Ratio vs. Ratio Trace for Dimensionality Reduction’, IEEE Conference on Computer Vision and Pattern Recognition (CVPR 07), Jun [3] Huan Wang, Shuicheng Yan, Thomas Huang, Jianzhuang Liu and Xiaoou Tang, ‘Transductive Regression Piloted by Inter-Manifold Relations ’,International Conference on Machine Learning (ICML 07), Jun [4] Huan Wang, Shuicheng Yan, Thomas Huang and Xiaoou Tang, ‘Maximum unfolded embedding: formulation, solution, and application for image clustering ’, ACM international conference on Multimedia (ACM MM07), Oct [5] Shuicheng Yan, Huan Wang, Thomas Huang and Xiaoou Tang, ‘Ranking with Uncertain Labels ’, IEEE International Conference on Multimedia & Expo (ICME07), May [6] Shuicheng Yan, Huan Wang, Xiaoou Tang and Thomas Huang, ‘Exploring Feature Descriptors for Face Recognition ’, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP07 Oral), Apri

Thank You!

Transductive Regression on Multi-Class Data Explore the intrinsic feature structures w.r.t. different classes for regression

Regression Algorithms. Reviews Exploit the manifold structures to guide the regression Belkin et.al, Regularization and semisupervised learning on large graphs transduces the function values from the labeled data to the unlabeled ones utilizing local neighborhood relations, Global optimization for a robust prediction. Cortes et.al, On transductive regression. Tikhonov Regularization on the Reproducing Kernel Hilbert Space (RKHS) Classification problem can be regarded as a special version of regression Fei Wang et.al, Label Propagation Through Linear Neighborhoods An iterative procedure is deduced to propagate the class labels within local neighborhood and has been proved convergent Regression Values are constrained at 0 and 1 (binary) samples belonging to the corresponding class =>1 o.w. => 0 The convergence point can be deduced from the regularization framework

The Problem We are Facing Age estimation w.r.t. different genders Pose Estimation w.r.t. different Genders Illuminations Expressions Persons w.r.t. different persons FG-NET Aging Database CMU-PIE Dataset

The problem The Problem We are Facing All samples are considered as in the same class Samples close in the data space X are assumed to have similar function values (smoothness along the manifold) For the incoming sample, no class information is given. Utilize class information in the training process to boost the performance Regression on Multi-Class Samples. Traditional Algorithms The class information is easy to obtain for the training data

The problem The Problem.Difference with Multiview Algorithms There exists a clear correspondence among multiple learners. The class information is utilized in two ways: Intra-Class Regularization & Inter-Class Regularization Multi-View Regression One object can have multiple views or employ multiple learners for the same object. Multi-Class Regression No explicit correspondence. The data of different classes may be obtained from different instances in our configuration, thus it is much more challenging. Disagreement of different learners is penalized

The algorithm TRIM. Assumption & Notation Samples from different classes lie within different sub-manifolds Samples from different classes share similar distribution along respective sub-manifolds Labels: Function values for regression. Intra-Manfiold Intra-Class, Inter- Manifold Inter-Class.

TRIM. Intra-Manifold Regularization Respective intrinsic graphs are built for different sample classes Correspondingly, intra-manifold regularization item for different classes are calculated separately intrinsic graph The Regularization when p=1 when p=2 It may not be proper to preserve smoothness between samples from different classes.

The algorithm TRIM. Inter-Manifold Regularization Assumptions Samples with similar labels lie generally in similar relative positions on the corresponding sub-manifolds. Motivation 1.Align the sub-manifolds of different class samples according to the labeled points and graph structures. 2. Derive the correspondence in the aligned space using nearest neighbor technique.

The algorithm TRIM. Reinforced Landmark Correspondence Initialize the inter-manifold graph using the - ball distance criterion on the sample labels Reinforce the inter-manifold connections by iteratively implementing Only sample pairs with top 20% largest similarity scores are selected as landmark correspondences.

The algorithm TRIM. Manifold Alignment Minimize the correspondence error on the landmark points Hold the intra-manifold structures The item is a global compactness regularization, and is the Laplacian Matrix of where 1 If and are of different classes 0 o.w.

TRIM. Inter-Manifold Regularization Concatenate the derived inter-manifold graphs to form Laplacian Regularization

Objective Deduction TRIM. Objective Fitness Item RKHS Norm Intra-Manifold Regularization Inter-Manifold Regularization

Solution TRIM. Solution The solution to the minimization of the objective admits an expansion (Generalized Representer theorem) Thus the minimization over Hilbert space boils down to minimizing the coefficient vector over The minimizer is given by where and K is the N × N Gram matrix of labeled and unlabeled points over all the sample classes.

Solution TRIM.Generalization For the out-of-sample data, the labels can be estimated using Note here in this framework the class information for the incoming sample is not required in the prediction stage. Original version without kernel

Two Moons Experiments

YAMAHA Dataset Experiments.Age Dataset TRIM vs traditional graph Laplacian regularized regression for the training set evaluation on YAMAHA database. Open set evaluation for the kernelized regression on the YAMAHA database. (left) Regression on the training set. (right) Regression on out-of-sample data

Summary A new topic that is often met in applications but receive little attention. Class information is utilized in the training stage to boost the performance and the system does not require class information in the testing stage. Intra-Class and Inter-Class graphs are constructed and corresponding regularizations are introduced. Sub-manifolds of different sample classes are aligned and labels are propagated among samples from different classes.