Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploring Intrinsic Structures from Samples: Supervised, Unsupervised, and Semisupervised Frameworks Supervised by Prof. Xiaoou Tang & Prof. Jianzhuang.

Similar presentations


Presentation on theme: "Exploring Intrinsic Structures from Samples: Supervised, Unsupervised, and Semisupervised Frameworks Supervised by Prof. Xiaoou Tang & Prof. Jianzhuang."— Presentation transcript:

1 Exploring Intrinsic Structures from Samples: Supervised, Unsupervised, and Semisupervised Frameworks Supervised by Prof. Xiaoou Tang & Prof. Jianzhuang Liu

2 Outline Trace Ratio Optimization Tensor Subspace Learning Correspondence Propagation Preserve sample feature structures Explore the geometric structures and feature domain relations concurrently Notations & introductions Dimensionality reduction

3 Outline Trace Ratio Optimization A convergent solution to Tensor Subspace Learning Semisupervised Regression on Multiclass Data Correspondence Propagation Preserve sample feature structures during the dimensionality reduction process Explore the geometric structures and feature domain relations concurrently Explore the feature structures w.r.t. different classes

4 Concept Concept. Tensor Tensor: multi-dimensional (or multi-way) arrays of components

5 Application Concept. Tensor real-world data are affected by multifarious factors for the person identification, we may have facial images of different ► views and poses ► lightening conditions ► expressions the observed data evolve differently along the variation of different factors ► image columns and rows

6 Application Concept. Tensor it is desirable to dig through the intrinsic connections among different affection factors of the data. Tensor provides a concise and effective representation. Illumination pose expression Image columns Image rows Images

7 Introduction Concept. Dimensionality Reduction Preserve sample feature structures Enhance classification capability Reduce the computational complexity

8 Preserve sample feature structures during the dimensionality reduction process

9 Trace Ratio Optimization. Definition w.r.t. Positive semidefinite Homogeneous property: Special case, when W is a vector Generalized Rayleigh Quotient GEVD Orthoganality constraint Optimization over the Grassman manifold

10 Trace Ratio Formulation Linear Discriminant Analysis

11 Trace Ratio Formulation Kernel Discriminant Analysis w.r.t. Decompose w.r.t. Let w.r.t.

12 Trace Ratio Formulation Marginal Fisher Analysis Intra-class graph (Intrinsic graph) Inter-class graph (Penalty graph)

13 Trace Ratio Formulation Kernel Marginal Fisher Analysis w.r.t. Decompose w.r.t. Let w.r.t.

14 Concept Trace Ratio Formulation 2-D Linear Discriminant Analysis Left Projection & Right Projection Fix one projection matrix & optimize the other Discriminant Analysis with Tensor Representation

15 Trace Ratio Formulation Tensor Subspace Analysis

16 Trace Ratio Formulation Conventional Solution: GEVD Singularity problem of Nullspace LDA Dualspace LDA

17 from Trace Ratio to Trace Difference Preprocessing Remove the Null Space of with Principal Component Analysis.

18 from Trace Ratio to Trace Difference What will we do? from Trace Ratio to Trace Difference Objective: Define Then Trace Ratio Trace Difference Find So that

19 from Trace Ratio to Trace Difference What will we do? from Trace Ratio to Trace Difference Constraint Let We have Thus The Objective rises monotonously! Whereare the leading eigen vectors of.

20 Strict Monotony and Convergency If Then Thus orthogonal matrix Meanwhile, both and have been calculated from the leading eigenvectors of the projected samples. span the same space So The algorithm is strictly monotonous! The algorithm is guaranteed to converge!

21 Main Algorithm Process Main Algorithm 1: Initialization. Initialize as arbitrary column orthogonal matrices. 2: Iterative optimization. For t=1, 2,..., Tmax, Do 1. Set. 2. Conduct Eigenvalue Decomposition: 3. Reshape the projection directions 4. 3: Output the projection matrices

22 Hightlights of the Trace Ratio based algorithm Highlights of our algorithm A new solution to the Discriminant Analysis algorithms. The derived projection matrix is orthogonal. Enhanced potential classification capability of the derived low- dimensional representation from the subspace learning algorithms.

23 A Convergent Solution to Tensor Subspace Learning

24 Tensor Subspace Learning algorithms Traditional Tensor Discriminant algorithms Tensor Subspace Analysis He et.al Two-dimensional Linear Discriminant Analysis Discriminant Analysis with Tensor Representation Ye et.al Yan et.al project the tensor along different dimensions or ways projection matrices for different dimensions are derived iteratively solve an trace ratio optimization problem DO NOT CONVERGE !

25 Tensor Subspace Learning algorithms Graph Embedding – a general framework An undirected intrinsic graph G={X,W} is constructed to represent the pairwise similarities over sample data. A penalty graph or a scale normalization item is constructed to impose extra constraints on the transform.

26 Discriminant Analysis Objective Solve the projection matrices iteratively: leave one projection matrix as variable while keeping others as constant. No closed form solution Mode-k unfolding of the tensor

27 Objective Deduction Discriminant Analysis Objective Trace Ratio: General Formulation for the objectives of the Discriminant Analysis based Algorithms. DATER: TSA: Within Class Scatter of the unfolded data Between Class Scatter of the unfolded data Diagonal Matrix with weights Constructed from Image Manifold

28 Disagreement between the Objective and the Optimization Process Why do previous algorithms not converge? GEVD The conversion from Trace Ratio to Ratio Trace induces an inconsistency among the objectives of different dimensions!

29 from Trace Ratio to Trace Difference What will we do? from Trace Ratio to Trace Difference Objective: Define Then Trace Ratio Trace Difference Find So that

30 from Trace Ratio to Trace Difference What will we do? from Trace Ratio to Trace Difference Constraint Let We have Thus The Objective rises monotonously! Projection matrices of different dimensions share the same objective Whereare the leading eigen vectors of.

31 Main Algorithm Process Main Algorithm 1: Initialization. Initialize as arbitrary column orthogonal matrices. 2: Iterative optimization. For t=1, 2,..., Tmax, Do For k=1, 2,..., n, Do 1. Set. 2. Compute and. 3. Conduct Eigenvalue Decomposition: 4. Reshape the projection directions 5. 3: Output the projection matrices

32 Strict Monotony and Convergency If Then Thus orthogonal matrix Meanwhile, both and have been calculated from the leading eigenvectors of the projected samples. span the same space So The algorithm is strictly monotonous! Theorem[Meyer,1976]: Assume that the algorithm Ω is strictly monotonic with respect to J and it generates a sequence which lies in a compact set. If is normed, then. The algorithm is guaranteed to converge!

33 Hightlights of the Trace Ratio based algorithm Highlights of our algorithm The objective value is guaranteed to monotonously increase; and the multiple projection matrices are proved to converge. Only eigenvalue decomposition method is applied for iterative optimization, which makes the algorithm extremely efficient. Enhanced potential classification capability of the derived low- dimensional representation from the subspace learning algorithms. The first work to give a convergent solution to the general tensor-based subspace learning.

34 Monotony of the Objective Experimental Results error vs. iteration number, and the trace difference (implies the error between and the optimum) vs. iteration number. (a-b) FERET database, and (c-d) CMU PIE database.

35 Projection Visualization Experimental Results Visualization of the projection matrix W of PCA, ratio trace based LDA, and trace ratio based LDA (ITR) on the FERET database.

36 Face Recognition Results.Linear Experimental Results Comparison: Trace Ratio Based LDA vs. the Ratio Trace based LDA (PCA+LDA) Comparison: Trace Ratio Based MFA vs. the Ratio Trace based MFA (PCA+MFA)

37 Face Recognition Results.Kernelization Experimental Results Trace Ratio Based KDA vs. the Ratio Trace based KDA Trace Ratio Based KMFA vs. the Ratio Trace based KMFA

38 Recognition error rates over dimensions Experimental Results Recognition error rates over different dimensions. The configuration is N3T7 on the CMU PIE database. For LDA and MFA, the dimension of the preprocessing PCA step is N-Nc.

39 Results on UCI Dataset Experimental Results Testing classification errors on three UCI databases for both linear and kernel- based algorithms. Results are obtained from 100 realizations of randomly generated 70/30 splits of data.

40 Monotony of the Objective & Projection Matrix Convergence Experimental Results

41 Face Recognition Results Experimental Results 1. TMFA TR mostly outperforms all the other methods concerned in this work, with only one exception for the case G5P5 on the CMU PIE database. 2. For vector-based algorithms, the trace ratio based formulation is consistently superior to the ratio trace based one for subspace learning. 3. Tensor representation has the potential to improve the classification performance for both trace ratio and ratio trace formulations of subspace learning.

42 Summary A novel iterative procedure was proposed to directly optimize the objective function of general subspace learning based on tensor representation. The convergence of the projection matrices and the monotony property of the objective function value were proven. The first work to give a convergent solution for the general tensor-based subspace learning.

43 Correspondence Propagation Geometric Structures & Feature Structures Explore the geometric structures and feature domain consistency for object registration

44 Objective Aim Exploit the geometric structures of sample features Introduce human interaction for correspondence guidance Seek a mapping of features from sets of different cardinalities Objects are represented as sets of feature points

45 Objective Motivation Pursue a closed-form solution Introduce human interaction for correspondence guidance Express feature similarity as a bipartite similarity graph Code the geometric feature distribution with spatial graphs

46 Graph Construction Spatial GraphSimilarity Graph

47 From Spatial Graph to Categorical Product Graph iff and Definition: Suppose and are the vertices of graph and respectively. Two assignments and are neighbors iff both pairsand are neighbors in and respectively, namely, where means and are neighbors. Assignment Neighborhood Definition

48 From Spatial Graph to Categorical Product Graph The adjacency matrix of can be derived from: where is the matrix Kronecker product operator. Smoothness along the spatial distribution:

49 Feature Domain Consistency & Soft Constraints Similarity Measure: One-to-one correspondence penalty whereis matrix Hardamard product and returns the sum of all elements in T where and or

50 Assignment Labeling Assign zeros to those pairs with extremely low similarity scores. Labeled assignments: Reliable correspondence & Inhomogeneous Pairs Inhomogeneous Pair Labeling Reliable Pair Labeling Assign ones to those reliable pairs

51 arrangement Reliable Correspondence Propagation Assignment variables Arrangement: Coefficient matrices Spatial Adjacency matrices

52 Objective Reliable Correspondence Propagation Objective: Feature domain agreement: Geometric smoothness regularization: One-to-one correspondence penalty:

53 Solution Reliable Correspondence Propagation where Relax to real domain & Closed-form Solution: and

54 Rearrangement & Discretizing Rearrangement and Discretization Inverse process of the element arrangement: Reshape the assignment vector into matrix: Thresholding: Assignments larger than a threshold are regarded as correspondences. Eliciting: Sequentially pick up the assignments with largest assignment scores.

55 Semisupervised & Automatic Systems Semi-supervised & Unsupervised Frameworks Exact pairwise correspondence labeling: Users give exact correspondence guidance Obscure correspondence guidance: Rough correspondence of image parts

56 Experimental Results. Demonstration

57 Experiment. Dataset

58 Experimental Results. Details Automatic feature matching score on the Oxford real image transformation dataset. The transformations include viewpoint change ((a) Graffiti and (b) Wall sequence), image blur ((c) bikes and (d) trees sequence), zoom and rotation ((e) bark and (f) boat sequence), illumination variation ((g) leuven ) and JPEG compression ((h) UBC).

59 Summary An efficient feature matching framework that transduces certain number of reliable correspondences to the remaining ones. Easy to switch from semi-supervised framework to an automatic system if combined with some simple approaches. Naturally extended to incorporate human interactions Both geometric smoothness and feature agreements are considered.

60 Summary Future Works From point-to-point correspondence to set-to-set correspondence. Multi-scale correspondence searching.

61 Summary Future Works From point-to-point correspondence to set-to-set correspondence. Multi-scale correspondence searching. Combine the object segmentation and registration.

62 Publications Publications: [1] Huan Wang, Shuicheng Yan, Thomas Huang and Xiaoou Tang, ‘A convergent solution to Tensor Subspace Learning’, International Joint Conferences on Artificial Intelligence (IJCAI 07 Regular paper), Jan. 2007. [2] Huan Wang, Shuicheng Yan, Thomas Huang and Xiaoou Tang, ‘Trace Ratio vs. Ratio Trace for Dimensionality Reduction’, IEEE Conference on Computer Vision and Pattern Recognition (CVPR 07), Jun. 2007. [3] Huan Wang, Shuicheng Yan, Thomas Huang, Jianzhuang Liu and Xiaoou Tang, ‘Transductive Regression Piloted by Inter-Manifold Relations ’,International Conference on Machine Learning (ICML 07), Jun. 2007. [4] Huan Wang, Shuicheng Yan, Thomas Huang and Xiaoou Tang, ‘Maximum unfolded embedding: formulation, solution, and application for image clustering ’, ACM international conference on Multimedia (ACM MM07), Oct. 2006. [5] Shuicheng Yan, Huan Wang, Thomas Huang and Xiaoou Tang, ‘Ranking with Uncertain Labels ’, IEEE International Conference on Multimedia & Expo (ICME07), May. 2007. [6] Shuicheng Yan, Huan Wang, Xiaoou Tang and Thomas Huang, ‘Exploring Feature Descriptors for Face Recognition ’, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP07 Oral), Apri. 2007.

63 Thank You!

64 Transductive Regression on Multi-Class Data Explore the intrinsic feature structures w.r.t. different classes for regression

65 Regression Algorithms. Reviews Exploit the manifold structures to guide the regression Belkin et.al, Regularization and semi- supervised learning on large graphs transduces the function values from the labeled data to the unlabeled ones utilizing local neighborhood relations, Global optimization for a robust prediction. Cortes et.al, On transductive regression. Tikhonov Regularization on the Reproducing Kernel Hilbert Space (RKHS) Classification problem can be regarded as a special version of regression Fei Wang et.al, Label Propagation Through Linear Neighborhoods An iterative procedure is deduced to propagate the class labels within local neighborhood and has been proved convergent Regression Values are constrained at 0 and 1 (binary) samples belonging to the corresponding class =>1 o.w. => 0 The convergence point can be deduced from the regularization framework

66 The Problem We are Facing Age estimation w.r.t. different genders Pose Estimation w.r.t. different Genders Illuminations Expressions Persons w.r.t. different persons FG-NET Aging Database CMU-PIE Dataset

67 The problem The Problem We are Facing All samples are considered as in the same class Samples close in the data space X are assumed to have similar function values (smoothness along the manifold) For the incoming sample, no class information is given. Utilize class information in the training process to boost the performance Regression on Multi-Class Samples. Traditional Algorithms The class information is easy to obtain for the training data

68 The problem The Problem.Difference with Multiview Algorithms There exists a clear correspondence among multiple learners. The class information is utilized in two ways: Intra-Class Regularization & Inter-Class Regularization Multi-View Regression One object can have multiple views or employ multiple learners for the same object. Multi-Class Regression No explicit correspondence. The data of different classes may be obtained from different instances in our configuration, thus it is much more challenging. Disagreement of different learners is penalized

69 The algorithm TRIM. Assumption & Notation Samples from different classes lie within different sub-manifolds Samples from different classes share similar distribution along respective sub-manifolds Labels: Function values for regression. Intra-Manfiold Intra-Class, Inter- Manifold Inter-Class.

70 TRIM. Intra-Manifold Regularization Respective intrinsic graphs are built for different sample classes Correspondingly, intra-manifold regularization item for different classes are calculated separately intrinsic graph The Regularization when p=1 when p=2 It may not be proper to preserve smoothness between samples from different classes.

71 The algorithm TRIM. Inter-Manifold Regularization Assumptions Samples with similar labels lie generally in similar relative positions on the corresponding sub-manifolds. Motivation 1.Align the sub-manifolds of different class samples according to the labeled points and graph structures. 2. Derive the correspondence in the aligned space using nearest neighbor technique.

72 The algorithm TRIM. Reinforced Landmark Correspondence Initialize the inter-manifold graph using the - ball distance criterion on the sample labels Reinforce the inter-manifold connections by iteratively implementing Only sample pairs with top 20% largest similarity scores are selected as landmark correspondences.

73 The algorithm TRIM. Manifold Alignment Minimize the correspondence error on the landmark points Hold the intra-manifold structures The item is a global compactness regularization, and is the Laplacian Matrix of where 1 If and are of different classes 0 o.w.

74 TRIM. Inter-Manifold Regularization Concatenate the derived inter-manifold graphs to form Laplacian Regularization

75 Objective Deduction TRIM. Objective Fitness Item RKHS Norm Intra-Manifold Regularization Inter-Manifold Regularization

76 Solution TRIM. Solution The solution to the minimization of the objective admits an expansion (Generalized Representer theorem) Thus the minimization over Hilbert space boils down to minimizing the coefficient vector over The minimizer is given by where and K is the N × N Gram matrix of labeled and unlabeled points over all the sample classes.

77 Solution TRIM.Generalization For the out-of-sample data, the labels can be estimated using Note here in this framework the class information for the incoming sample is not required in the prediction stage. Original version without kernel

78 Two Moons Experiments

79 YAMAHA Dataset Experiments.Age Dataset TRIM vs traditional graph Laplacian regularized regression for the training set evaluation on YAMAHA database. Open set evaluation for the kernelized regression on the YAMAHA database. (left) Regression on the training set. (right) Regression on out-of-sample data

80 Summary A new topic that is often met in applications but receive little attention. Class information is utilized in the training stage to boost the performance and the system does not require class information in the testing stage. Intra-Class and Inter-Class graphs are constructed and corresponding regularizations are introduced. Sub-manifolds of different sample classes are aligned and labels are propagated among samples from different classes.


Download ppt "Exploring Intrinsic Structures from Samples: Supervised, Unsupervised, and Semisupervised Frameworks Supervised by Prof. Xiaoou Tang & Prof. Jianzhuang."

Similar presentations


Ads by Google