Presentation is loading. Please wait.

Presentation is loading. Please wait.

Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n<<D Benefits: Lower computing complexity Improvement.

Similar presentations


Presentation on theme: "Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n<<D Benefits: Lower computing complexity Improvement."— Presentation transcript:

1 Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n<<D Benefits: Lower computing complexity Improvement of the classification performance Danger: Possible loss of information

2 Basic approaches to DR Feature extraction Transform t : R D  R n Creation of a new feature space. The features lose their original meaning. Feature selection Selection of a subset of the original features.

3 Principal Component Transform (Karhunen-Loeve) PCT belongs to “feature extraction”, t is a rotation y = Tx, T is a matrix of the eigenvectors of the original covariance matrix C x PCT creates D new uncorrelated features y, C y = T’ C x T n features with the highest variations are kept

4 Principal Component Transform

5 Applications of the PCT “Optimal” data representation, compaction of the energy Visualization and compression of multimodal images

6 PCT of multispectral images Satellite image: B, G, R, nIR, IR, thermal IR

7 Why is PCT bad for classification purposes? PCT evaluates the contribution of individual features solely by their variation, which may be different from their discrimination power.

8 Why is PCT bad for classification purposes?

9 Separability problem Dimensionality reduction methods for classification purposes (Two-class problem) must consider the discrimination power of individual features. The goal is to maximize the “distance” between the classes.

10 An Example 3 classes, 3D feature space, reduction to 2D High discriminabilityLow discriminability

11 DR via feature selection Two things needed: Discriminability measure (Mahalanobis distance, Bhattacharyya distance) MD 12 = (m 1 – m 2 )(C 1 + C 2 ) -1 (m 1 – m 2 )’ Selection strategy Feature selection  optimization problem

12 Feature selection strategies Optimal - full search, complexity D!/(D-n)!n! - branch & bound Sub-optimal - direct selection (optimal if the features are not correlated) - sequential selection (SFS, SBS) - generalized sequential selection (SFS(k), Plus k minus m, floating search)

13 A priori knowledge in feature selection The above discriminability measures (MD, BD) require normally distributed classes. They are misleading and inapplicable otherwise.

14 A priori knowledge in feature selection The above discriminability measures (MD, BD) require normally distributed classes. They are misleading and inapplicable otherwise. Crucial questions in practical applications: Can the class-condicional distributions be assumed to be normal? What happens if this assumption is wrong?

15 A two-class example Class 2 Class 1 x2 is selected x1 is selected

16 Conclusion PCT is optimal for representation of “one- class” data (visualization, compression, etc). PCT should not be used for classification purposes. Use feature selection methods based on a proper discriminability measure. If you still use PCT before classification, be aware of possible errors.


Download ppt "Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n<<D Benefits: Lower computing complexity Improvement."

Similar presentations


Ads by Google