Computational Genomics and Proteomics Lecture 9 CGP Part 2: Introduction Based on: Martin Bachler slides.

Slides:



Advertisements
Similar presentations
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Advertisements

Predictive Analysis of Gene Expression Data from Human SAGE Libraries Alexessander Alves* Nikolay Zagoruiko + Oleg Okun § Olga Kutnenko + Irina Borisova.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Variable - / Feature Selection in Machine Learning (Review)
Minimum Redundancy and Maximum Relevance Feature Selection
Correlation Aware Feature Selection Annalisa Barla Cesare Furlanello Giuseppe Jurman Stefano Merler Silvano Paoli Berlin – 8/10/2005.
COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah –
Feature Selection Presented by: Nafise Hatamikhah
Principal Component Analysis
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
4 th NETTAB Workshop Camerino, 5 th -7 th September 2004 Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini
Evaluation.
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
Classification 10/03/07.
Feature Selection Lecture 5
Feature Selection Bioinformatics Data Analysis and Tools
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
Special Topic: Missing Values. Missing Values Common in Real Data  Pneumonia: –6.3% of attribute values are missing –one attribute is missing in 61%
Guidelines on Statistical Analysis and Reporting of DNA Microarray Studies of Clinical Outcome Richard Simon, D.Sc. Chief, Biometric Research Branch National.
Jeff Howbert Introduction to Machine Learning Winter Machine Learning Feature Creation and Selection.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Whole Genome Expression Analysis
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
by B. Zadrozny and C. Elkan
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
The Broad Institute of MIT and Harvard Classification / Prediction.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction David PageSoumya Ray Department of Biostatistics and Medical Informatics Department.
Microarray Workshop 1 Introduction to Classification Issues in Microarray Data Analysis Jane Fridlyand Jean Yee Hwa Yang University of California, San.
CZ5225: Modeling and Simulation in Biology Lecture 8: Microarray disease predictor-gene selection by feature selection methods Prof. Chen Yu Zong Tel:
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
Classification of microarray samples Tim Beißbarth Mini-Group Meeting
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
SVM-based techniques for biomarker discovery in proteomic pattern data Elena Marchiori Department of Computer Science Vrije Universiteit Amsterdam.
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Stable Feature Selection for Biomarker Discovery Name: Goutham Reddy Bakaram Student Id: Instructor Name: Dr. Dongchul Kim Review Article by Zengyou.
CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
COT6930 Course Project. Outline Gene Selection Sequence Alignment.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
Data Mining and Decision Support
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
CS Machine Learning Instance Based Learning (Adapted from various sources)
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
School of Computer Science & Engineering
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
Machine Learning Feature Creation and Selection
Information Retrieval
Feature Selection Methods
Lecture 16. Classification (II): Practical Considerations
Presentation transcript:

Computational Genomics and Proteomics Lecture 9 CGP Part 2: Introduction Based on: Martin Bachler slides

Overview CGP part 2 Lesson 9: Introduction, description of assignment, short overview papers Lesson 10: Mass Spectrometric Data Analysis (Guest lecturer: Thang Pham, VUmc) Lesson 11: Team+E.M. meeting 1 Lesson 12: Team+E.M. meeting 2 Lesson 13: Presentation/Evaluation Part 2

CGP part 2 Main focus of CGP2: Feature Selection Three papers available for teams’ study –Each team (pair of students) chooses one paper –Critical study of topic/method of the paper (meeting with E.M.) –Proposal of method changes (meeting with E.M.) –Public presentation, discussion, evaluation (all teams) One paper review on Feature Selection techniques in Bioinformatics: –Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics Bioinformatics Oct 1;23(19): Saeys YInza ILarrañaga P

Papers K. Jong, E. Marchiori, M. Sebag, A. van der Vaart. Feature Selection in Proteomic Pattern Data with Support Vector Machines. CIBCB, pp , IEEE, Feature Selection in Proteomic Pattern Data with Support Vector Machines. CIBCB K. Ye, A. Feenstra, A.Ijzerman, J. Heringa, E. Marchiori. Multi-RELIEF: a method to recognize specificity determining residues from multiple sequence alignments using a Machine Learning approach for feature weighting. Bioinformatics, Multi-RELIEF: a method to recognize specificity determining residues from multiple sequence alignments using a Machine Learning approach for feature weighting. K. Ye, E.W. Lameijer, M.W. Beukers, and A.P. IJzerman (2006) A two-entropies analysis to identify functional positions in the transmembrane region of class A G protein- coupled receptors. Proteins, Structure, Function and Bioinformatics 63,

Students’ Evaluation Based on: –Discussions –Report describing topic/method and proposed changes –Presentation

Problem: Where to focus attention ? A universal problem of intelligent (learning) agents is where to focus their attention. What aspects of the problem at hand are important/necessary to solve it? Discriminate between the relevant and irrelevant parts of experience.

What is Feature selection ? Feature selection: Problem of selecting some subset of a learning algorithm’s input variables upon which it should focus attention, while ignoring the rest (DIMENSIONALITY REDUCTION) Humans/animals do that constantly!

Feature Selection in ML ? Why even think about Feature Selection in ML? -The information about the target class is inherent in the variables! -Naive theoretical view: More features => More information => More discrimination power. -In practice: many reasons why this is not the case! -Also: Optimization is (usually) good, so why not try to optimize the input-coding ?

Feature Selection in ML ? YES! -Many explored domains have hundreds to tens of thousands of variables/features with many irrelevant and redundant ones! -In domains with many features the underlying probability distribution can be very complex and very hard to estimate (e.g. dependencies between variables) ! -Irrelevant and redundant features can „confuse“ learners! -Limited training data! -Limited computational resources! -Curse of dimensionality!

Curse of dimensionality

The required number of samples (to achieve the same accuracy) grows exponentionally with the number of variables! In practice: number of training examples is fixed! => the classifier’s performance usually will degrade for a large number of features! In many cases the information that is lost by discarding variables is made up for by a more accurate mapping/sampling in the lower-dimensional space !

Example for ML-Problem Gene selection from microarray data –Variables: gene expression coefficients corresponding to the amount of mRNA in a patient‘s sample (e.g. tissue biopsy) –Task: Separate healthy patients from cancer patients –Usually there are only about 100 examples (patients) available for training and testing (!!!) –Number of variables in the raw data: – –Does this work ? ([8]) [8] C. Ambroise, G.J. McLachlan: Selection bias in gene extraction on the basis of microarray gene-expresseion data. PNAS Vol (2002)

Example for ML-Problem Text-Categorization -Documents are represented by a vector of dimension the size of the vocabulary containing word frequency counts -Vocabulary ~ words (i.e. each document is represented by a dimensional vector) -Typical tasks: -Automatic sorting of documents into web-directories -Detection of spam-

Motivation Especially when dealing with a large number of variables there is a need for dimensionality reduction! Feature Selection can significantly improve a learning algorithm’s performance!

Approaches Wrapper – feature selection takes into account the contribution to the performance of a given type of classifier Filter – feature selection is based on an evaluation criterion for quantifying how well feature (subsets) discriminate the two classes Embedded – feature selection is part of the training procedure of a classifier (e.g. decision trees)

Embedded methods Attempt to jointly or simultaneously train both a classifier and a feature subset Often optimize an objective function that jointly rewards accuracy of classification and penalizes use of more features. Intuitively appealing Example: tree-building algorithms Adapted from J. Fridlyand

Input Features Feature Selection by Distance Metric Score Train Model Feature Selection Search Feature Set Importance of features given by the model Filter Approach Wrapper Approach Input Features Model Train Model Model Approaches to Feature Selection Adapted from Shin and Jasso

Filter methods R p Feature selection R s s << p Classifier design Features are scored independently and the top s are used by the classifier Score: correlation, mutual information, t-statistic, F-statistic, p-value, tree importance statistic etc Easy to interpret. Can provide some insight into the disease markers. Adapted from J. Fridlyand

Problems with filter method Redundancy in selected features: features are considered independently and not measured on the basis of whether they contribute new information Interactions among features generally can not be explicitly incorporated (some filter methods are smarter than others) Classifier has no say in what features should be used: some scores may be more appropriates in conjuction with some classifiers than others. Adapted from J. Fridlyand

Dimension reduction: a variant on a filter method Rather than retain a subset of s features, perform dimension reduction by projecting features onto s principal components of variation (e.g. PCA etc) Problem is that we are no longer dealing with one feature at a time but rather a linear or possibly more complicated combination of all features. It may be good enough for a black box but how does one build a diagnostic chip on a “supergene”? (even though we don’t want to confuse the tasks) Those methods tend not to work better than simple filter methods. Adapted from J. Fridlyand

Wrapper methods R p Feature selection R s s << p Classifier design Iterative approach: many feature subsets are scored based on classification performance and best is used. Selection of subsets: forward selection, backward selection, Forward-backward selection, tree harvesting etc Adapted from J. Fridlyand

Problems with wrapper methods Computationally expensive: for each feature subset to be considered, a classifier must be built and evaluated No exhaustive search is possible (2 subsets to consider) : generally greedy algorithms only. Easy to overfit. p Adapted from J. Fridlyand

Example: Microarray Analysis “Labeled” cases ( 38 bone marrow samples: 27 AML, 11 ALL Each contains 7129 gene expression values) Train model (using Neural Networks, Support Vector Machines, Bayesian nets, etc.) Model 34 New unlabeled bone marrow samples AML/ALL key genes

Few samples for analysis (38 labeled) Extremely high-dimensional data (7129 gene expression values per sample) Noisy data Complex underlying mechanisms, not fully understood Microarray Data Challenges :

Some genes are more useful than others for building classification models Example: genes 36569_at and 36495_at are useful

AML ALL Some genes are more useful than others for building classification models

Example: genes 37176_at and 36563_at not useful Some genes are more useful than others for building classification models

Importance of Feature (Gene) Selection Majority of genes are not directly related to leukemia Having a large number of features enhances the model’s flexibility, but makes it prone to overfitting Noise and the small number of training samples makes this even more likely Some types of models, like kNN do not scale well with many features

Classification: CV error Training error –Empirical error Error on independent test set –Test error Cross validation (CV) error –Leave-one-out (LOO) –N-fold CV N samples splitting 1/n samples for testing Summarize CV error rate N-1/n samples for training Count errors

Two schemes of cross validation N samples LOO Train and test the feature-selector and the classifier Count errors N samples feature selection Train and test the classifier Count errors LOO CV2CV1

Difference between CV1 and CV2 CV1 gene selection within LOOCV CV2 gene selection before before LOOCV CV2 can yield optimistic estimation of classification true error CV2 used in paper by Golub et al. : –0 training error –2 CV error (5.26%) –5 test error (14.7%) –CV error different from test error!

Significance of classification results Permutation test: –Permute class label of samples –LOOCV error on data with permuted labels –Repeat process a high number of times –Compare with LOOCV error on original data: P-value = (# times LOOCV on permuted data <= LOOCV on original data) / total # of permutations considered

Feature Selection techniques in a nutshell WHY ? WHAT ? HOW ? Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics Bioinformatics Oct 1;23(19): Saeys YInza ILarrañaga P

Now it is team’s work! Meetings: 29/11, 6/12: –each team comes to discuss with E.M. once a week for 30 minutes In the last lecture (13 Dec): –Report due date –Public talk by each team See you on the 29/11!