Presentation is loading. Please wait.

Presentation is loading. Please wait.

Consensus Group Stable Feature Selection

Similar presentations


Presentation on theme: "Consensus Group Stable Feature Selection"— Presentation transcript:

1 Consensus Group Stable Feature Selection
Steven Loscalzo Dept. of Computer Science Binghamton University Lei Yu Dept. of Computer Science Binghamton University Chris Ding Dept. of Computer Science and Engineering University of Texas at Arlington The 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

2 Consensus Group Stable Feature Selection
Overview Background and motivation Propose Consensus Feature Group Framework Finding Consensus Groups Feature Selection from Consensus Groups Experimental Study Conclusion Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009

3 Feature Selection Stability
Sampling Model Building Feature Selection Acc % Sample 1 All Training Data F={f2,f5} 92% Sample 2 F’={f4,f10} 91% input data is broken into different samples to better estimate the classification performance Sample k F’’={f5, f11} 93% Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009

4 Consensus Group Stable Feature Selection
Motivation Need for stable feature selection Give confidence to lab tests Uncover “truly” relevant information Utility of feature groups Model feature interaction Lack information about a single feature, another in the group may be well studied Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009

5 Dense Feature Group Framework
Dense feature groups can provide stability and accuracy [Yu, Ding, Loscalzo, KDD-08] Dense Group Stable Feature Selection Framework Map features as points in sample space Apply kernel density estimation locate dense feature groups Select top relevant groups from dense groups Limitations of this framework Unreliable density estimation in high-dimensional spaces Restricts selection of relevant groups to dense groups Mention data has been transposed earlier (each dimension is one sample in the data) Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009

6 Consensus Feature Group Framework
Consensus feature groups are ensemble of feature grouping results Select relevant groups from whole spectrum of consensus groups Challenges Base algorithm for ensemble: dense group finder [Yu, Ding, Loscalzo, KDD-08] Aggregate feature grouping results Use dense feature group as base algorithm (look in paper for wording) Select relevant features from whole spectrum of conssensus groups 1. point: Mention that feature grouping results should be well formed groups and different across samples (accurate and diverse) Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009

7 Consensus Group Stable Feature Selection
Group Aggregation Data sub-sample 3 aggregation ideas: Heuristics (reference set) Cluster based [Fern, Brodley, ICML-03] Instance based [Fern, Brodley, ICML-03] Feature Group Results 1 1 f1 f2 f3 f4 f5 f2 2 2 f1 f4 f5 f3 f2 3 3 f1 f5 f3 f4 Get rid of blue circles – keep circles to represent groups Last 2 ideas try to recluster the results based on first clusters Cluster based treats each feature group as an object to cluster (need to way to measure similarity) Highlight Instance based approach f4 Consensus Feature Groups f5 f2 f3 f1 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009

8 CGS: The Consensus Group Stable Feature Selection Algorithm
The CGS Algorithm D CGS: The Consensus Group Stable Feature Selection Algorithm for i = 1 to t do Construct Training Partition Di from D Run DGF on Di for every pair of features Xi and Xj in D Update Wi,j := freq. Xi and Xj appear together in results create consensus groups CG1,CG2,…,CGL via hierarchical clustering of all features based on Wi,j for i = 1 to L do Obtain a representative feature Xi from CGi Measure relevance of Xi set as relevance of CGi Rank CG1,CG2,…,CGL and return the top k D1 Dt ... Result Grouping 1 Result Grouping t Measure Instance Co-occurrence Now the original feature space is represented by a rep. feature so any feature selection algorithm can do Or we can pick a center (virtual feature) Hierarchical Clustering Consensus Feature Groups ... Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009

9 Consensus Group Stable Feature Selection
Experimental Setup Used 10 random shuffles of data: 10 fold cross validation 9/10 folds training 1/10 folds testing Results shown are averages across 10 folds x 10 shuffles Setting Data Set # Genes # Samples # Classes Colon 2000 62 2 Leukemia 7129 72 Lung 12533 181 Prostate 6034 102 Lymphoma 4026 3 SRBCT 2308 63 4 Algorithms CGS – sub-samples t = 10 DRAGS [Yu, Ding, Loscalzo, KDD-08] – top dense group based feature selection SVM-RFE [Guyon et al, ML-02] – recursively eliminates features based on weights found after training an SVM Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009

10 Stability Selected Groups
Stability Selected Features Pairwise similarity across groups, take average State we defined these measures in the paper Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009

11 Consensus Group Stable Feature Selection
Accuracy Results Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009

12 Consensus Group Stable Feature Selection
Conclusion Proposed consensus group stable feature selection framework Stable Accurate Future directions Apply different ensemble techniques Incorporate new group finding algorithms Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009

13 Consensus Group Stable Feature Selection
References Fern, X. Z., and Brodley, C. Random projection for high-dimensional data clustering: a cluster ensemble approach. In Proceedings of the 20th Conference on Machine Learning (ICML-03) , 2003. Guyon, I., Weston, J., Barnhill, S., Vapnik, V. Gene selection for cancer classification using support vector machines. Machine Learning (ML-02);46:389–422, 2002. Yu, L., Ding, C., and Loscalzo, S. Stable feature selection via dense feature groups. In Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining (KDD-08) , 2008. Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009


Download ppt "Consensus Group Stable Feature Selection"

Similar presentations


Ads by Google