Consensus Group Stable Feature Selection

Slides:



Advertisements
Similar presentations
Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.
Advertisements

Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.
Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.
V ARIANCE R EDUCTION FOR S TABLE F EATURE S ELECTION Presenter: Yue Han Advisor: Lei Yu Department of Computer Science 10/27/10.
Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach Xiaoli Zhang Fern, Carla E. Brodley ICML’2003 Presented by Dehong Liu.
Yue Han and Lei Yu Binghamton University.
Correlation Aware Feature Selection Annalisa Barla Cesare Furlanello Giuseppe Jurman Stefano Merler Silvano Paoli Berlin – 8/10/2005.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010 Presenter: Yunchao Gong Dept. Computer Science, UNC Chapel Hill.
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
ACM Multimedia th Annual Conference, October , 2004
4 th NETTAB Workshop Camerino, 5 th -7 th September 2004 Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini
Support Vector Machines Pattern Recognition Sergios Theodoridis Konstantinos Koutroumbas Second Edition A Tutorial on Support Vector Machines for Pattern.
ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy,
Margin Based Sample Weighting for Stable Feature Selection Yue Han, Lei Yu State University of New York at Binghamton.
Presentation in IJCNN 2004 Biased Support Vector Machine for Relevance Feedback in Image Retrieval Hoi, Chu-Hong Steven Department of Computer Science.
Dept. of Computer Science & Engineering, CUHK Pseudo Relevance Feedback with Biased Support Vector Machine in Multimedia Retrieval Steven C.H. Hoi 14-Oct,
1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.
CIBB-WIRN 2004 Perugia, 14 th -17 th September 2004 Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini Feature.
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
A hybrid method for gene selection in microarray datasets Yungho Leu, Chien-Pan Lee and Ai-Chen Chang National Taiwan University of Science and Technology.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Evaluation of Supervised Learning Algorithms on Gene Expression Data CSCI 6505 – Machine Learning Adan Cosgaya Winter 2006 Dalhousie University.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian & Edward Wild University of Wisconsin Madison Workshop on Optimization-Based Data.
1 Classifying Lymphoma Dataset Using Multi-class Support Vector Machines INFS-795 Advanced Data Mining Prof. Domeniconi Presented by Hong Chai.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.
Today Ensemble Methods. Recap of the course. Classifier Fusion
CLASSIFICATION: Ensemble Methods
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
SVM-based techniques for biomarker discovery in proteomic pattern data Elena Marchiori Department of Computer Science Vrije Universiteit Amsterdam.
Fuzzy Machine Learning Methods for Biomedical Data Analysis
Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement.
Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci
Applications of Supervised Learning in Bioinformatics Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Stable Feature Selection for Biomarker Discovery Name: Goutham Reddy Bakaram Student Id: Instructor Name: Dr. Dongchul Kim Review Article by Zengyou.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Final Project Mei-Chen Yeh May 15, General In-class presentation – June 12 and June 19, 2012 – 15 minutes, in English 30% of the overall grade In-class.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.
NTU & MSRA Ming-Feng Tsai
Presented by: Mingkui Tan, Li Wang, Ivor W. Tsang School of Computer Engineering June 21-24, ICML2010 Haifa, Israel Learning Sparse SVM.
CISC Machine Learning for Solving Systems Problems Presented by: Eunjung Park Dept of Computer & Information Sciences University of Delaware Solutions.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Unsupervised Streaming Feature Selection in Social Media
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Nawanol Theera-Ampornpunt, Seong Gon Kim, Asish Ghoshal, Saurabh Bagchi, Ananth Grama, and Somali Chaterji Fast Training on Large Genomics Data using Distributed.
Technische Universität München Yulia Gembarzhevskaya LARGE-SCALE MALWARE CLASSIFICATON USING RANDOM PROJECTIONS AND NEURAL NETWORKS Technische Universität.
A Fast Kernel for Attributed Graphs Yu Su University of California at Santa Barbara with Fangqiu Han, Richard E. Harang, and Xifeng Yan.
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )
University of Georgia 1 Yanqing Zhang Department of Computer Science Georgia State University Atlanta, GA
Semi-Supervised Clustering
Stable Feature Selection: Theory and Algorithms
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Mammogram Analysis – Tumor classification
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
CARPENTER Find Closed Patterns in Long Biological Datasets
PEBL: Web Page Classification without Negative Examples
Prasit Usaphapanus Krerk Piromsopa
Sequential Hierarchical Clustering
Three steps are separately conducted
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Presentation transcript:

Consensus Group Stable Feature Selection Steven Loscalzo Dept. of Computer Science Binghamton University Lei Yu Dept. of Computer Science Binghamton University Chris Ding Dept. of Computer Science and Engineering University of Texas at Arlington The 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Consensus Group Stable Feature Selection Overview Background and motivation Propose Consensus Feature Group Framework Finding Consensus Groups Feature Selection from Consensus Groups Experimental Study Conclusion Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009

Feature Selection Stability Sampling Model Building Feature Selection Acc % Sample 1 All Training Data F={f2,f5} 92% Sample 2 F’={f4,f10} 91% input data is broken into different samples to better estimate the classification performance … Sample k F’’={f5, f11} 93% Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009

Consensus Group Stable Feature Selection Motivation Need for stable feature selection Give confidence to lab tests Uncover “truly” relevant information Utility of feature groups Model feature interaction Lack information about a single feature, another in the group may be well studied Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009

Dense Feature Group Framework Dense feature groups can provide stability and accuracy [Yu, Ding, Loscalzo, KDD-08] Dense Group Stable Feature Selection Framework Map features as points in sample space Apply kernel density estimation locate dense feature groups Select top relevant groups from dense groups Limitations of this framework Unreliable density estimation in high-dimensional spaces Restricts selection of relevant groups to dense groups Mention data has been transposed earlier (each dimension is one sample in the data) Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009

Consensus Feature Group Framework Consensus feature groups are ensemble of feature grouping results Select relevant groups from whole spectrum of consensus groups Challenges Base algorithm for ensemble: dense group finder [Yu, Ding, Loscalzo, KDD-08] Aggregate feature grouping results Use dense feature group as base algorithm (look in paper for wording) Select relevant features from whole spectrum of conssensus groups 1. point: Mention that feature grouping results should be well formed groups and different across samples (accurate and diverse) Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009

Consensus Group Stable Feature Selection Group Aggregation Data sub-sample 3 aggregation ideas: Heuristics (reference set) Cluster based [Fern, Brodley, ICML-03] Instance based [Fern, Brodley, ICML-03] Feature Group Results 1 1 f1 f2 f3 f4 f5 f2 2 2 f1 f4 f5 f3 f2 3 3 f1 f5 f3 f4 Get rid of blue circles – keep circles to represent groups Last 2 ideas try to recluster the results based on first clusters Cluster based treats each feature group as an object to cluster (need to way to measure similarity) Highlight Instance based approach f4 Consensus Feature Groups f5 f2 f3 f1 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009

CGS: The Consensus Group Stable Feature Selection Algorithm The CGS Algorithm D CGS: The Consensus Group Stable Feature Selection Algorithm for i = 1 to t do Construct Training Partition Di from D Run DGF on Di for every pair of features Xi and Xj in D Update Wi,j := freq. Xi and Xj appear together in results create consensus groups CG1,CG2,…,CGL via hierarchical clustering of all features based on Wi,j for i = 1 to L do Obtain a representative feature Xi from CGi Measure relevance of Xi set as relevance of CGi Rank CG1,CG2,…,CGL and return the top k … D1 Dt ... Result Grouping 1 Result Grouping t Measure Instance Co-occurrence Now the original feature space is represented by a rep. feature so any feature selection algorithm can do Or we can pick a center (virtual feature) Hierarchical Clustering Consensus Feature Groups ... Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009

Consensus Group Stable Feature Selection Experimental Setup Used 10 random shuffles of data: 10 fold cross validation 9/10 folds training 1/10 folds testing Results shown are averages across 10 folds x 10 shuffles Setting Data Set # Genes # Samples # Classes Colon 2000 62 2 Leukemia 7129 72 Lung 12533 181 Prostate 6034 102 Lymphoma 4026 3 SRBCT 2308 63 4 Algorithms CGS – sub-samples t = 10 DRAGS [Yu, Ding, Loscalzo, KDD-08] – top dense group based feature selection SVM-RFE [Guyon et al, ML-02] – recursively eliminates features based on weights found after training an SVM Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009

Stability Selected Groups Stability Selected Features Pairwise similarity across groups, take average State we defined these measures in the paper Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009

Consensus Group Stable Feature Selection Accuracy Results Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009

Consensus Group Stable Feature Selection Conclusion Proposed consensus group stable feature selection framework Stable Accurate Future directions Apply different ensemble techniques Incorporate new group finding algorithms Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009

Consensus Group Stable Feature Selection References Fern, X. Z., and Brodley, C. Random projection for high-dimensional data clustering: a cluster ensemble approach. In Proceedings of the 20th Conference on Machine Learning (ICML-03). 186-192, 2003. Guyon, I., Weston, J., Barnhill, S., Vapnik, V. Gene selection for cancer classification using support vector machines. Machine Learning (ML-02);46:389–422, 2002. Yu, L., Ding, C., and Loscalzo, S. Stable feature selection via dense feature groups. In Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining (KDD-08). 803-811, 2008. Loscalzo, Yu, Ding Consensus Group Stable Feature Selection June 30th, 2009