Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Efficient Greedy Method for Unsupervised Feature Selection

Similar presentations


Presentation on theme: "An Efficient Greedy Method for Unsupervised Feature Selection"— Presentation transcript:

1 An Efficient Greedy Method for Unsupervised Feature Selection
Ahmed Farahat Joint work with Ali Ghodsi, and Mohamed Kamel {afarahat, aghodsib, mkamel} @uwaterloo.ca ICDM 2011

2 Outline Introduction Proposed Work Experiments and Results Conclusion
Dimension Reduction & Feature Selection Previous Work Proposed Work Feature Selection Criterion Recursive Formula Greedy Feature Selection Experiments and Results Conclusion

3 Dimension Reduction In data mining applications, data instances are typically described by a huge number of features. Images (>2 megapixels) Documents (>10K words) Most of these features are irrelevant or redundant. Goal: Reduce the dimensionality of the data: allow a better understanding of data improve the performance of other learning tasks

4 Feature Selection vs. Extraction
Feature Selection (a.k.a variable selection) searches for a relevant subset of existing features (−) a combinatorial optimization problem (+) features are easy to interpret Feature Extraction (a.k.a feature transformation) learns a new set of features (+) unique solutions in polynomial time (−) features are difficult to interpret

5 Feature Selection Wrapper vs. filter methods:
Wrapper methods search for features which enhance the performance of the learning task (+) more accurate, (−) more complex Filter methods analyze the intrinsic properties of the data, and select highly-ranked features according to some criterion. (+) less complex, (−) less accurate Supervised vs. unsupervised methods This work: filter and unsupervised methods

6 Previous Work PCA-based Sparse PCA-based
calculate PCA, associate features with principal components based on their coefficients, select features associated with the first principal components (Jolliffe, 2002) Sparse PCA-based calculate sparse PCA (Zou et al. 2006), select for each principal component the subset of features with non-zero coefficients Convex Principal Feature Selection (CPFS) (Masaeli et al SDM’10) formulates a continuous optimization problem which minimizes the reconstruction error of the data matrix with sparsity constraints

7 Previous Work (Cont.) Feature Selection using Feature Similarity (FSFS) (Mitra et al. TPAMI’02) groups features into clusters and then selects a representative feature for each cluster Laplacian Score (LS) (He et al. NIPS’06) selects features that preserve similarities between data instances Multi-Cluster Feature Selection (MCFS) (Cai et al. KDD’10) selects features that preserve the multi-cluster structure of the data

8 This Work A criterion for unsupervised feature selection
minimizes the reconstruction error of the data matrix based on the selected subset of features A recursive formula for calculating the criterion An effective greedy algorithm for unsupervised feature selection P S

9 Feature Select Criterion
Data matrix Reconstructed matrix n features Minimize loss m instances Least squares

10 Feature Select Criterion (Cont.)
Problem 1: (Unsupervised Feature Selection) Find a subset of features such that where and This is an NP-hard combinatorial optimization problem.

11 Recursive Selection Criterion
Theorem 1: Given a set of features . For any , where P S

12 Recursive Selection Criterion (Cont.)
Lemma 1: Given a set of features . For any , where

13 Proof of Lemma 1

14 Proof of Lemma 1 (Cont.) Let be the Schur complement of in .
Use block-wise inversion formula of :

15 Recursive Selection Criterion (Cont.)
Corollary 1: Given a set of features For any , Proof: Using Lemma 1,

16 Recursive Selection Criterion
Theorem 1: Given a set of features . For any , where

17 Proof of Theorem 1

18 Greedy Selection Criterion
Problem 2: (Greedy Feature Selection) At iteration t, find feature l such that, Using Theorem 1: where Problem 2 is equivalent to:

19 Greedy Selection Criterion (Cont.)

20 Greedy Selection Criterion (Cont.)
At iteration t: Problems: Memory inefficient: Computationally complex: per iteration

21 Greedy Selection Criterion (Cont.)
At iteration t, define: Calculate E and G recursively as: , Define ,

22 Memory-Efficient Selection
Update formulas for f and g

23 Partition-based Selection
Greedy selection criterion: per iteration At each iteration, n candidate features x n projections Solution: Partition features into c << n random groups Select the feature which best represents the centroids of these groups Similar update formulas can be developed for f and g Complexity: per iteration

24

25 Experiments Seven methods were compared
PCA-LRG: is a PCA-based method that selects features associated with the first k principal components (Masaeli et al 2010) FSFS: is the Feature Selection using Feature Similarity (Mitra et al. 2006) LS: is the Laplacian Score (LS) method (He et al. 2006) SPEC: is the spectral feature selection method (Zhao et al. 2007) MCFS: is the Multi-Cluster Feature Selection method (Cai et al. 2010) GreedyFS: is the basic greedy algorithm (using recursive update formulas for f and g but without random partitioning) PartGreedyFS: is the partition-based greedy algorithm

26 Data Sets These data sets were recently used by Cai et al. (2010) to evaluate different feature selection methods in comparison to the Multi-Cluster Feature Selection (MCFS) method.

27 Results – k-means

28 Results – Affinity Propagation

29 Results – Run Times

30 Results – Run Times

31 Conclusion This work presents a novel greedy algorithm for unsupervised feature selection. a feature selection criterion which measures the reconstruction error of the data matrix based on the subset of selected features a recursive formula for calculating the feature selection criterion an efficient greedy algorithm for feature selection, and two memory and time efficient variants It has been empirically shown that the proposed algorithm achieves better clustering performance is less computationally demanding than methods that give comparable clustering performance

32 Thank you!

33 References I. Jolliffe, Principal Component Analysis, 2nd ed. Springer, 2002 H. Zou, T. Hastie, and R. Tibshirani, “Sparse principal component analysis,” J. Comput. Graph. Stat., 2006 M. Masaeli, Y. Yan, Y. Cui, G. Fung, and J. Dy, “Convex principal feature selection,” SIAM SDM 2010 X. He, D. Cai, and P. Niyogi, “Laplacian score for feature selection,” NIPS 2006 Y. Cui and J. Dy, “Orthogonal principal feature selection,” in the Sparse Optimization and Variable Selection Workshop, ICML 2008 Z. Zhao and H. Liu, “Spectral feature selection for supervised and unsupervised learning,” ICML 2007 D. Cai, C. Zhang, and X. He, “Unsupervised feature selection for multi-cluster data,” KDD 2010 P. Mitra, C. Murthy, and S. Pal, “Unsupervised feature selection using feature similarity,” IEEE Trans. Pattern Anal. Mach. Intell., 2002.


Download ppt "An Efficient Greedy Method for Unsupervised Feature Selection"

Similar presentations


Ads by Google