Feature Selection of DNA Micrroarray Data

Slides:



Advertisements
Similar presentations
March 2006Alon Slapak 1 of 1 Bayes Classification A practical approach Example Discriminant function Bayes theorem Bayes discriminant function Bibliography.
Advertisements

Component Analysis (Review)
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct
Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland Alexey Tsymbal Department of Computer Science Trinity.
Minimum Redundancy and Maximum Relevance Feature Selection
COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah –
Feature Selection Presented by: Nafise Hatamikhah
Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap
Reduced Support Vector Machine
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Classification with reject option in gene expression data Blaise Hanczar and Edward R Dougherty BIOINFORMATICS Vol. 24 no , pages
Machine Learning CMPT 726 Simon Fraser University
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Randomized Variable Elimination David J. Stracuzzi Paul E. Utgoff.
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
CEN 592 PATTERN RECOGNITION Spring Term CEN 592 PATTERN RECOGNITION Spring Term DEPARTMENT of INFORMATION TECHNOLOGIES Assoc. Prof.
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
Lecture 19 Representation and description II
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Functional Brain Signal Processing: EEG & fMRI Lesson 7 Kaushik Majumdar Indian Statistical Institute Bangalore Center M.Tech.
Chapter 7 FEATURE EXTRACTION AND SELECTION METHODS Part 2 Cios / Pedrycz / Swiniarski / Kurgan.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Optimal Component Analysis Optimal Linear Representations of Images for Object Recognition X. Liu, A. Srivastava, and Kyle Gallivan, “Optimal linear representations.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller 國立雲林科技大學 National Yunlin University of.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
Ping-Tsun Chang Intelligent Systems Laboratory Computer Science and Information Engineering National Taiwan University Combining Unsupervised Feature Selection.
COT6930 Course Project. Outline Gene Selection Sequence Alignment.
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Computational Intelligence: Methods and Applications Lecture 33 Decision Tables & Information Theory Włodzisław Duch Dept. of Informatics, UMK Google:
NTU & MSRA Ming-Feng Tsai
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Feature Selection Methods Part-I By: Dr. Rajeev Srivastava IIT(BHU), Varanasi.
Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Ching-Lung Chen Author : Pabitra Mitra Student Member 國立雲林科技大學 National Yunlin University.
CS 9633 Machine Learning Support Vector Machines
LECTURE 11: Advanced Discriminant Analysis
Presented by Jingting Zeng 11/26/2007
LECTURE 10: DISCRIMINANT ANALYSIS
Basic machine learning background with Python scikit-learn
PCA vs ICA vs LDA.
Feature Selection To avid “curse of dimensionality”
Outline S. C. Zhu, X. Liu, and Y. Wu, “Exploring Texture Ensembles by Efficient Markov Chain Monte Carlo”, IEEE Transactions On Pattern Analysis And Machine.
REMOTE SENSING Multispectral Image Classification
Discriminative Frequent Pattern Analysis for Effective Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 09: DISCRIMINANT ANALYSIS
Feature Selection Methods
FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR
Review of Statistical Pattern Recognition
Presentation transcript:

Feature Selection of DNA Micrroarray Data Presented by: Mohammed Liakat Ali Course: 60-520 Fall 2005 University of Windsor ali1p@uwindsor.ca December 2, 2005

Outline Introduction Deployment of Feature Selection methods Class Separability Measures Review of Minimum Redundancy feature selection methods Comparison with our Experimental Results Conclusions Q & A December 2, 2005

Introduction Microarray Data Representation of Objects Classifiers Feature Selection vs. Feature Extraction Optimal Feature Set for Classification December 2, 2005

Microarray Data Microarray technology is one of the most promising tools available to life science researchers. Two technologies are used to produce DNA microarray: The cDNA arrays the Affymatrix technologies Also known as DNA chip The final result of microarray experiment is a set of numbers representing expression level of DNA fragments i.e., genes. December 2, 2005

Representation of Objects Objects are represented by their characteristic features Three main reasons to keep dimensionality low: Measurement Cost Classification Accuracy To identify and monitor the target disease or function types It is very important to represent an object with features having high discriminating ability. December 2, 2005

Classifiers A classifier will use features of an object and a discriminant function to assign the object to a category i.e., class. Domain independent theory of classification is based on the abstraction provided by features of the input data We can divide classifiers as: linear non-linear December 2, 2005

Feature Selection vs. Feature Extraction In feature selection we try to find the best subset of the input feature set In feature extraction we create new features based on transformation or combination of the original feature set December 2, 2005

Optimal Feature Subset for Classification To find optimal feature subset we have to evaluate objective function for subsets Exponential complexity December 2, 2005

Deployment of Feature Selection Methods Based on their relation to the induction algorithm feature selection methods can be grouped as: Embedded: They are a part of induction algorithms Filter: They are separate processes from the induction algorithms Wrapper: They are also separate processes from induction algorithm but they use induction algorithm as a subroutine December 2, 2005

Deployment of Feature Selection Methods December 2, 2005

Feature Selection Methods Based on the optimal solution of the problem, we can divide feature selection methods as: Optimal Selection Methods Suboptimal Selection Methods December 2, 2005

Feature Selection Methods December 2, 2005

Optimal Selection Methods Exhaustive Search Branch and Bound Search December 2, 2005

Exhaustive Search Evaluate all possible subsets consisting of m features of total d features i.e., subsets Guaranteed to find optimal subset An exponential problem December 2, 2005

Branch and Bound Search Only fraction of all possible feature subsets will be evaluated Guaranteed to find optimal subset Criterion function must satisfy the monotonicity property i.e., December 2, 2005

Suboptimal Selection Methods Best individual Feature Sequential Forward Selection (SFS) Sequential Backward Selection (SBS) “Plus l take away r” Selection Sequential Forward Floating Search (SFFS) Sequential Backward Floating Search (SBFS) December 2, 2005

Best individual Feature Evaluate all d features individually using an scalar criterion function Select m best features Clearly a sub optimal method Complexity is O(d) December 2, 2005

Sequential Forward Selection (SFS) At the beginning select the best feature using a scalar criterion function Add one feature at a time which along with already selected features to maximize the criterion function, J(.) A greedy algorithm, cannot retract Complexity is O(d) December 2, 2005

Sequential Backward Selection (SBS) At the beginning select all d features Delete one feature at a time and Select the subset which maximize the criterion function, J(.) Also a greedy algorithm, cannot retract Complexity is O(d) December 2, 2005

“Plus l take away r” Selection At first add l features by forward selection, then discard r features by backward selection Need to decide optimal l and r No subset nesting problems Like SFS and SBS December 2, 2005

Sequential Forward Floating Search (SFFS) It is a generalized ‘plus l take away r’ algorithm The value of l and r are determined automatically Close to optimal solution Affordable computational cost December 2, 2005

Sequential Backward Floating Search (SBFS) It is also a generalized ‘plus l take away r’ algorithm like SFFS The value of l and r are also determined automatically Close to optimal solution as SFFS More efficient than SFFS for m closer to d than to 1 December 2, 2005

Class Separability Measures Divergence Scatter Matrices December 2, 2005

Divergence As per Bayes rule, given two classes ω1 and ω2 and a feature vector x, we select ω1 if P(ω1|x) > P(ω2|x) Hence ratio has discriminating capability December 2, 2005

Divergence For given P(ω1) and P(ω2) same information resides in D12(x) = ln For completely overlapping classes D12(x) = 0 December 2, 2005

Divergence Since x takes different values, it is natural to consider mean value over class ω1 D12 = Similarly for ω2 D21 = The sum d12 = D12 +D21 December 2, 2005

Scatter Matrices Computation of Divergence is not easy for non Gaussian distribution Within class scatter matrix is defined as Sw = Si is the covariance matrix for class ωi Si = December 2, 2005

Scatter Matrices Between class scatter matrix is defined as Sb = Where μ0 = December 2, 2005

Scatter Matrices Total Mixture scatter matrix is defined as Sm = E[(x-µ0)(x-μ0)’] Where Sm = Sw + Sb December 2, 2005

Scatter Matrices The following criterion functions can be defined among others J1= J2= J3 = December 2, 2005

Scatter Matrices For equally probable two classes problem |Sw| is proportional to σ1²+ σ2² |Sb| is proportional to (µ1-µ2)² December 2, 2005

Review of Minimum Redundancy feature selection methods Now we will discuss two minimum redundancy feature selection methods given in the two following papers Ding and Peng (2003) Yu and Liu (2004) December 2, 2005

Review of Minimum Redundancy feature selection methods In Ding and Peng (2003) Filter method is used Algorithm is SFS The first feature was selected using maxV1, for all genes in the set S December 2, 2005

Review of Minimum Redundancy feature selection methods Suppose already selected m features for the set X The additional features will be selected from the set Y = S – X The following two conditions will be optimized simultaneously 1. 2. December 2, 2005

Review of Minimum Redundancy feature selection methods Mutual information, I of two variable x and y is defined as Importance of minimum redundancy is highlighted in the paper December 2, 2005

Review of Minimum Redundancy feature selection methods In Yu and Liu (2004) Filter method is used Algorithm is: Relevance analysis 1 Order features based on decreasing ISU values Redundancy analysis 2 Initialize Fi with the first feature in the list 3 Find and remove all features for which Fi forms an approximate redundant cover 4 Set Fi as the next remaining feature in the list and repeat step 3 until the end of the list December 2, 2005

Review of Minimum Redundancy feature selection methods Combines SFS with elimination The entropy of a variable X is defined as H(X) = - The entropy of X after observing values of another variable Y is defined as H(X|Y) = - The amount by which the entropy of X decreases reflects additional information about X provided by Y, is called Information Gain IG(X|Y) = H(X) – H(X|Y) December 2, 2005

Review of Minimum Redundancy feature selection methods Symmetrical uncertainty is defined as SU(X, Y) = Individual C-correlation (ISUi): The correlation between any feature Fi and the class C is called Individual C-correlation, ISUi Combined C-correlation (CSUi): The correlation between any feature Fi and Fj (i ≠ j) and the class C is called combined C-correlation, CSUi_j Approximate redundant cover: For two features Fi and Fj, Fi formed an approximate redundant cover for Fj iff ISUi ≥ ISUj and ISUi ≥ CSUi_j December 2, 2005

Comparison with our Experimental Results To investigate the problem of feature selection we implement a filter method We used FDR as criterion function Initial gene selection was based on gene ranking Then Fisher and Loog-Duin Discriminant techniques are applied to transform the feature space Then linear and quadratic classifier are used 10-fold cross validation was applied We used Leukemia, Lung cancer, and Breast cancer data from UCI repository December 2, 2005

Comparison with our Experimental Results Dataset #G #S #SG RBF #S #SG FQ LDQ FL LDL Leukemia 7129 72 4 87.50 72 80 98.75 59.23 98.75 95.00 Lung cancer 12533 181 6 98.34 197 367 67.12 49.89 77.32 73.60 Breast cancer 24481 97 67 79.38 97 273 78.63 68.72 78.63 74.70 Table 1. Comparison of gene selection results. RBF = Redundancy Based Filter FQ = Fisher’s Discriminant + Quadratic classifier FL = Fisher’s Discriminant + Linear classifier LDQ = Loog-Duin’s Discriminant + Quadratic classifier LDL = Loog-Duin’s Discriminant + Linear classifier December 2, 2005

Comparison with our Experimental Results From the table we can observed that RBF selected very compact gene sets for all the cases. FQ and FL out perform LDQ and LDL in all 3 datasets. RBF out perform all methods in 1 dataset by big margin. FQ and FL jointly out perform others in 1 dataset also in big margin. RBF, FQ, and FL have comparable result in 1 dataset. December 2, 2005

Conclusions We can conclude that minimum redundancy methods select very compact gene sets. It can help to identify and monitor the target disease or function types. December 2, 2005

Conclusions From our experience, on average the performance of LDQ is better than FQ because Fisher discrminant analysis is linear in nature. Here we select gene by FDR ranking. Due this performance of FQ and FL may get enhancement. From the result we can also conclude that gene selection by only ranking has some merits. December 2, 2005

References 1.Blum, A. and Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97(1-2) 245–271 2. T.M. Cover, “The Best Two Independent Measurements Are Not the Two Best,” IEEE Trans. Systems, Man, and Cybernetics, vol. 4, pp. 116-117, 1974. 2. Ding, C. and Peng, H. C. (2003). Minimum Redundancy Feature Selection from Microarray Gene Expression Data. Proc. Second 3. EEE Computational Systems Bioinformatics Conf., 523-528 4. R. Duda, P. Hart, and D. Stork. Pattern Classification. John Wiley and Sons, Inc., New York, NY, 2nd edition, 2000. 5. K. S. V. Horn and T. Martinez. The Minimum Set Problem. Neural Networks, 7(3):491–494, 1994. December 2, 2005

References 6. Duin R. P. W. Jain, A. K. and J. Mao. Statistical Pattern Recognition: A review. IEEE Transaction on Pattern Analysis and Machine Intelligence, 22(1), 2000. 7. M. Loog and P.W. Duin. Linear Dimensionality Reduction via a Heteroscedastic Extension of LDA: The Chernoff Criterion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6):732–739, 2004. 8. S. Theodoridis and K. Koutroumbas. Pattern Recognition. Elsevier Academic Press, second edition, 2003. 9. L. Yu and H. Liu. Redundency Based Feature Selection for Microarray Data. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 737 – 742, 2004. December 2, 2005

Q & A Thanking You December 2, 2005