Stable Feature Selection for Biomarker Discovery Name: Goutham Reddy Bakaram Student Id: 20299604 Instructor Name: Dr. Dongchul Kim Review Article by Zengyou.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Design of Experiments Lecture I
ECG Signal processing (2)
Random Forest Predrag Radenković 3237/10
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
Proposed concepts illustrated well on sets of face images extracted from video: Face texture and surface are smooth, constraining them to a manifold Recognition.
Gene selection using Random Voronoi Ensembles Stefano Rovetta Department of Computer and Information Sciences, University of Genoa, Italy Francesco masulli.
Gizem ALAGÖZ. Simulation optimization has received considerable attention from both simulation researchers and practitioners. Both continuous and discrete.
Chapter 4 Validity.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Sparse vs. Ensemble Approaches to Supervised Learning
Feature Selection for Regression Problems
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Margin Based Sample Weighting for Stable Feature Selection Yue Han, Lei Yu State University of New York at Binghamton.
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
CS Instance Based Learning1 Instance Based Learning.
Jeff Howbert Introduction to Machine Learning Winter Machine Learning Feature Creation and Selection.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Issues with Data Mining
Active Learning for Class Imbalance Problem
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Machine Learning CSE 681 CH2 - Supervised Learning.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Presented by: Apeksha Khabia Guided by: Dr. M. B. Chandak
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Universit at Dortmund, LS VIII
1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2
Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.
How to write a professional paper. 1. Developing a concept of the paper 2. Preparing an outline 3. Writing the first draft 4. Topping and tailing 5. Publishing.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Designing multiple biometric systems: Measure of ensemble effectiveness Allen Tang NTUIM.
2005/12/021 Content-Based Image Retrieval Using Grey Relational Analysis Dept. of Computer Engineering Tatung University Presenter: Tienwei Tsai ( 蔡殿偉.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Exploiting Group Recommendation Functions for Flexible Preferences.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:
Ensemble Methods in Machine Learning
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
COT6930 Course Project. Outline Gene Selection Sequence Alignment.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.
NTU & MSRA Ming-Feng Tsai
Advanced Statistics Factor Analysis, I. Introduction Factor analysis is a statistical technique about the relation between: (a)observed variables (X i.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
2D-LDA: A statistical linear discriminant analysis for image matrix
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Chapter 7. Classification and Prediction
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Rule Induction for Classification Using
School of Computer Science & Engineering
Instance Based Learning
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
Presentation 王睿.
iSRD Spam Review Detection with Imbalanced Data Distributions
Presentation transcript:

Stable Feature Selection for Biomarker Discovery Name: Goutham Reddy Bakaram Student Id: Instructor Name: Dr. Dongchul Kim Review Article by Zengyou He, Weichuan School of software, Dalian University of Technology, China

Contents 1.Abstract 2.Introduction 3.Causes of Instability 4.Existing Methods 5.Stability Measure 6.Analysis 7.Conclusions 8.Future Work

Abstract 1.The article reviews different existing stable feature selection methods using a generic hierarchical framework for biomarker discovery. 2.Two Objectives: Providing an overview on this latest yet fast topic for convenient reference. Categorizing existing methods into a framework for future research and development. Keywords: Feature selection; Biomarker Discovery; Stability; Machine learning

Introduction What is Feature Selection? In machine learning, feature selection is the process of selecting a subset of relevant features for use in model construction. They are used for three reasons: 1.Simplification 2.Shorter Time 3.Enhanced generalization by reducing overfitting. The discovery of biomarkers from high-throughput “omics” data is typically modeled as selecting the most discriminating features (or variables) for classification (e.g., discriminating healthy versus diseased, or different tumor stages) The stability issue in feature selection has received much attention recently. In this article, we shall review existing methods for stable feature selection in biomarker discovery applications, summarize them with an unified framework and provide a convenient reference for future research and development.

Causes of Instability There are mainly three sources of instability in biomarker discovery: 1.Algorithm design without considering stability: Classic feature selection methods aim at selecting a minimum subset of features to construct a classifier of the best predictive accuracy. 2. The existence of multiple sets of true markers: It is possible that there exist multiple sets of potential true markers in real data. On the one hand, when there are many highly correlated features, different ones may be selected under different settings. On the other hand, even there are no redundant features, the existence of multiple non-correlated sets of real markers is also possible. 3. Small number of samples in high-dimensional data: In the analysis of gene expression data and proteomics data, there are typically only hundreds of samples but thousands of features.

Existing Methods 1. The ensemble feature selection method and the method using prior feature relevance incorporate stability consideration into the algorithm design stage. To handle data with highly correlated features, the group feature selection approach treats feature cluster as the basic unit in the selection process to increase robustness. The sample injection method tries to increase the sample size to address the small-sample-size versus large-feature-size issue. Fig. 1 A hierarchical framework for stable feature selection methods. Fig. 1

Ensemble Feature Selection 1.Ensemble learning methods combine multiple learned models under the assumption that “two (or more) heads are better than one”. 2.Ensemble feature selection techniques use a two-step procedure that is similar to ensemble classification and regression: Creating a set of different feature selectors. Aggregating the results of component selectors to generate the ensemble output. 3. The second step is typically modeled as a rank aggregation problem. Rank aggregation combines multiple rankings into a consensus ranking, which has been widely studied in the context of web search engines

Data perturbation In ensemble feature selection with data perturbation, different samplings of the original data are generated to construct different feature selectors, as described in Fig. 2. Fig. 2 These methods can be further distinguished according to the sampling method, the component feature selection algorithm and the rank aggregation method. (Table 1)

Function Perturbation 1.Here we use function perturbation to refer to those ensemble feature selection methods in which the component learners are different from each other. The basic idea is to capitalize on the strengths of different algorithms to obtain robust feature subsets. 2. Function perturbation is different from data perturbation in two perspectives: It uses different feature selection algorithms rather than the same feature selection method. It typically conducts local feature selection on the original data (without sampling).

Feature selection with prior feature relevance In most biomarker discovery applications, we typically assume that all features are equally relevant before the selection procedure. In practice, some prior knowledge may be available to bias the selection towards some features assumed to be more relevant. Fig. 3 shows that there are several methods for obtaining such kind of prior knowledge. One feasible method is to seek advices from domain experts or relevant publications.Fig. 3

Group Feature Selection 1.One motivation for group feature selection is that groups of correlated features commonly exist in high-dimensional data, and such groups are resistant to the variations of training samples 2.Group formation is the process of identifying groups of associated features. There are typically two classes of methods for this purpose: knowledge-driven methods and data-driven methods. 3.Group transformation generates a single coherent representation for each feature group. The transformation method can range from simple approaches like feature value mean to complicated methods such as principal component analysis.

Sample Injection 1. In biomarker discovery applications, the number of features is typically larger than the sample size. This is one of the main sources of instability in feature selection. To increase the reproducibility of feature selection, one natural idea is to generate more samples. 2. However, the generation of real sample data from patients and healthy people is usually expensive and time-consuming. With this practical limitation in mind, people begin to seek other alternative methods for the same purpose

Stability Measure 1.How to measure the “stability” of feature selection algorithms? Answer: Two approaches are reviewed. On the one hand, it is indispensable for evaluating different algorithms in performance comparison. On the other hand, it can be used for internal validation in feature selection algorithms that take into account stability. 2. Measuring stability requires a similarity measure for feature selection results. This depends on the representations used by different algorithms. Formally, let training examples be described by a vector of features F = (f 1,f 2,…,f m ), then there are three types of representation methods: A subset of features: S = {s 1,s 2,…,s k }, s i ∈ {f 1,f 2,…,f m }. A ranking vector: R = (r 1,r 2,…,r m ), 1 ≤ r i ≤ m. A weighting-score vector: W=(w 1,w 2,..,w m ), w i ∈ R Stability measures that take two subsets are only considered for the sake of notation simplication.

Feature Subset Most of the measures in feature subset are defined using the physical properties of only two sets. MS 1: Relative hamming distance between the masks corresponding to two subsets. MS 2: The Tanimoto distance metric measures the amount of overlap between two sets of arbitrary cardinality MS 3: The Dice–Sorensen's index is the harmonic mean of |S ∩ S′|/|S| and |S ∩ S′|/|S′|.

Ranking List 1.The problem of comparing ranking lists is widely studied in different contexts such as voting theory and web document retrieval. 2.One typical example is MR2, in which the Spearman's correlation is adapted to place more weights on those top ranked features since these features are more important than irrelevant features in the stability evaluation. MR1: The Spearman's rank correlation coefficient takes values in [−1,1], with 1 meaning that the two ranking lists are identical and a value of −1 meaning that they have exactly inverse orders. MR2: The Canberra distance is a weighted version of Spearman's foot rule distance. MR3: The overlap score is originally proposed in and here we follow to reformulate it with the assumption that only top ranked features are important.

Weighting-score Vector The computational issue of combinatorial search for feature subset can to some extent be alleviated by using a feature weighting strategy. Allowing feature weights to take real- valued numbers instead of binary ones enables us to use well-established optimization techniques in algorithmic development. The weighting-score vector is seldom used in defining stability measure. Table 6 lists one stability measure MW1Table 6

Analysis 1.Out of the three sources of instability for feature selection, the small number of samples in high-dimensional feature area is most difficult part in biomarker discovery- WHY? 2.Group feature selection is used mostly in existing stable feature selection approaches because of many correlated features in high-dimensional space. However it has a PROBLEM. 3.The solution is implementing a hybrid strategy is to combine group feature selection with ensemble feature selection.

Future Work The group feature selection strategy is only helpful when multiple sets of true markers are generated due to the existence of redundant features. However, it is also possible that multiple sets of true markers share no correlated features. The feature selection problem in this case is much harder than finding a minimal optimal feature set for classification. To our knowledge, there is still no available method and measure that aim at handling stability issues in this context. The general problem is open and needs more research efforts. How to directly measure the stability of feature(s) without sampling training data? Can we propose new methods that are capable of explicitly controlling the stability of reported feature subset? Are there other special requirements for biomarker discovery rather than stability?

Conclusion This paper summarizes existing stable feature selection methods and stability measures. When it comes to theoretical perspective and practical aspect, stable feature selection is an important problem. More efforts has to devoted to this topic.