Presentation is loading. Please wait.

Presentation is loading. Please wait.

Feature Extraction 主講人:虞台文.

Similar presentations


Presentation on theme: "Feature Extraction 主講人:虞台文."— Presentation transcript:

1 Feature Extraction 主講人:虞台文

2 Content Principal Component Analysis (PCA) Factor Analysis
Fisher’s Linear Discriminant Analysis Multiple Discriminant Analysis

3 Principal Component Analysis (PCA)
Feature Extraction Principal Component Analysis (PCA)

4 Principle Component Analysis
It is a linear procedure to find the direction in input space where most of the energy of the input lies. Feature Extraction Dimension Reduction It is also called the (discrete) Karhunen-Loève transform, or the Hotelling transform.

5 The Basis Concept x w wTx That is, Demo
Assume data x (random vector) has zero mean. w PCA finds a unit vector w to reflect the largest amount of variance of the data. wTx That is, Demo

6 The Method Remark: C is symmetric and semipositive definite.
Covariance Matrix

7 The Method maximize subject to The method of Lagrange multiplier:
Define The extreme point, say, w* satisfies

8 The Method maximize subject to Setting

9 Discussion At extreme points
Let w1, w2, …, wd be the eigenvectors of C whose corresponding eigenvalues are 1≧ 2 ≧ … ≧ d. They are called the principal components of C. Their significance can be ordered according to their eigenvalues. w is a eigenvector of C, and  is its corresponding eigenvalue.

10 Discussion At extreme points
Let w1, w2, …, wd be the eigenvectors of C whose corresponding eigenvalues are 1≧ 2 ≧ … ≧ d. They are called the principal components of C. Their significance can be ordered according to their eigenvalues. If C is symmetric and semipositive definite, all their eigenvectors are orthogonal. They, hence, form a basis of the feature space. For dimensionality reduction, only choose few of them.

11 Applications Image Processing Signal Processing Compression
Feature Extraction Pattern Recognition

12 Example Projecting the data onto the most significant axis will facilitate classification. This also achieves dimensionality reduction.

13 Issues The most significant component obtained using PCA. PCA is effective for identifying the multivariate signal distribution. Hence, it is good for signal reconstruction. But, it may be inappropriate for pattern classification. The most significant component for classification

14 Whitening Whitening is a process that transforms the random vector, say, x = (x1, x2 , …, xn )T (assumed it is zero mean) to, say, z = (z1, z2 , …, zn )T with zero mean and unit variance. z is said to be white or sphered. This implies that all of its elements are uncorrelated. However, this doesn’t implies its elements are independent.

15 Whitening Transform Decompose Cx as Set
Clearly, D is a diagonal matrix and E is an orthonormal matrix. Whitening Transform Let V be a whitening transform, then Decompose Cx as Set

16 Whitening Transform Proof)
If V is a whitening transform, and U is any orthonormal matrix, show that UV, i.e., rotation, is also a whitening transform. Proof)

17 Why Whitening? With PCA, we usually choose several major eigenvectors as the basis for representation. This basis is efficient for reconstruction, but may be inappropriate for other applications, e.g., classification. By whitening, we can rotate the basis to get more interesting features.

18 Feature Extraction Factor Analysis

19 What is a Factor? If several variables correlate highly, they might measure aspects of a common underlying dimension. These dimensions are called factors. Factors are classification axis along which the measures can be plotted. The greater the loading of variables on a factor, the more that factor can explain intercorrelations between those variables.

20 Graph Representation Quantitative Skill (F1) Verbal (F2) 1 +1

21 What is Factor Analysis?
A method for investigating whether a number of variables of interest Y1, Y2, …, Yn, are linearly related to a smaller number of unobservable factors F1, F2, …, Fm. For data reduction and summarization. Statistical approach to analyze interrelationships among the large number of variables & to explain these variables in term of their common underlying dimensions (factors).

22 Example What factors influence students’ grades? Observable Data
Quantitative skill? unobservable Example Verbal skill? Observable Data

23 The Model y: Observation Vector B: Factor-Loading Matrix
f: Factor Vector : Gaussian-Noise Matrix

24 The Model y: Observation Vector B: Factor-Loading Matrix
f: Factor Vector : Gaussian-Noise Matrix

25 The Model Can be obtained from the model Can be estimated from data

26 The Model Commuality Specific Variance Explained Unexplained

27 Example Cy  BBT + Q =

28 Goal Our goal is to minimize Hence,

29 Uniqueness Is the solution unique?
There are infinite number of solutions. Since if B* is a solution and T is an orthonormal transformation (rotation), then BT is also a solution.

30 Cy = Example Which one is better?

31 Example Left: each factor have nonzero loading for all variables.
Right: each factor controls different variables. i1 i2 i1 i2

32 The Method Determine the first set of loadings using principal component method.

33 Example Cy 

34 Factor Rotation Factor-Loading Matrix Rotation Matrix Factor Rotation:

35 Factor Rotation Criteria: Varimax Quartimax Equimax Orthomax Oblimin
Factor-Loading Matrix Factor Rotation:

36 Criterion: Maxmize Varimax Subject to Let . . .

37 Criterion: Maxmize Varimax Subject to Construct the Lagrangian

38 Varimax cjk dk bjk

39 Varimax Define is the kth column of

40 Varimax is the kth column of

41 Varimax Goal: reaches maximum once

42 Varimax Goal: Initially, obtain B0 by whatever method, e.g., PCA.
set T0 as the approximation rotation matrix, e.g., T0=I. Iteratively execute the following procedure: evaluate and You need information of B1. find such that Next slide if stop Repeat

43 Varimax Goal: Pre-multiplying each side by its transpose. Initially,
obtain B0 by whatever method, e.g., PCA. set T0 as the approximation rotation matrix, e.g., T0=I. Iteratively execute the following procedure: evaluate and You need information of B1. find such that Next slide if stop Repeat

44 Varimax Criterion: Maximize . . .

45 Maximize Varimax Let

46 Fisher’s Linear Discriminant Analysis
Feature Extraction Fisher’s Linear Discriminant Analysis

47 Main Concept PCA seeks directions that are efficient for representation. Discriminant analysis seeks directions that are efficient for discrimination.

48 Classification Efficiencies on Projections

49 Criterion  Two-Category
1 m 2 m

50 Scatter ||w|| = 1 w m m The larger the better Between-Class Scatter
Between-Class Scatter Matrix Scatter ||w|| = 1 w 1 m Between-Class Scatter 2 m The larger the better

51 Scatter ||w|| = 1 w m m The smaller the better Within-Class Scatter
Between-Class Scatter Matrix Scatter Within-Class Scatter Matrix ||w|| = 1 w 1 m 2 m Within-Class Scatter The smaller the better

52 Goal ||w|| = 1 w m m Define Generalized Rayleigh quotient
Between-Class Scatter Matrix Goal Within-Class Scatter Matrix ||w|| = 1 Define Generalized Rayleigh quotient w 1 m 2 m The length of w is immaterial.

53 Generalized Eigenvector
To maximize J(w), w is the generalized eigenvector associated with largest generalized eigenvalue. Define Generalized Rayleigh quotient That is, or The length of w is immaterial.

54 Proof To maximize J(w), w is the generalized eigenvector associated with largest generalized eigenvalue. Set That is, or

55 Example 2 1 m - w w w

56 Multiple Discriminant Analysis
Feature Extraction Multiple Discriminant Analysis

57 Generalization of Fisher’s Linear Discriminant
For the c-class problem, we seek a (c1)-dimension projection for efficient discrimination.

58 Scatter Matrices  Feature Space
Total Scatter Matrix 2 m 1 m + Within-Class Scatter Matrix 3 m Between-Class Scatter Matrix

59 The (c1)-Dim Projection
The projection space will be described using a d(c1) matrix W. 2 m 1 m + 3 m

60 Scatter Matrices  Projection Space
Total Scatter Matrix 2 m 1 m 1 ~ m 2 3 + + Within-Class Scatter Matrix 3 m W Between-Class Scatter Matrix

61 Criterion Total Scatter Matrix Within-Class Scatter Matrix
Between-Class Scatter Matrix


Download ppt "Feature Extraction 主講人:虞台文."

Similar presentations


Ads by Google