Download presentation

1
**Feature Selection of DNA Micrroarray Data**

Presented by: Mohammed Liakat Ali Course: Fall 2005 University of Windsor December 2, 2005

2
**Outline Introduction Deployment of Feature Selection methods**

Class Separability Measures Review of Minimum Redundancy feature selection methods Comparison with our Experimental Results Conclusions Q & A December 2, 2005

3
**Introduction Microarray Data Representation of Objects Classifiers**

Feature Selection vs. Feature Extraction Optimal Feature Set for Classification December 2, 2005

4
Microarray Data Microarray technology is one of the most promising tools available to life science researchers. Two technologies are used to produce DNA microarray: The cDNA arrays the Affymatrix technologies Also known as DNA chip The final result of microarray experiment is a set of numbers representing expression level of DNA fragments i.e., genes. December 2, 2005

5
**Representation of Objects**

Objects are represented by their characteristic features Three main reasons to keep dimensionality low: Measurement Cost Classification Accuracy To identify and monitor the target disease or function types It is very important to represent an object with features having high discriminating ability. December 2, 2005

6
Classifiers A classifier will use features of an object and a discriminant function to assign the object to a category i.e., class. Domain independent theory of classification is based on the abstraction provided by features of the input data We can divide classifiers as: linear non-linear December 2, 2005

7
**Feature Selection vs. Feature Extraction**

In feature selection we try to find the best subset of the input feature set In feature extraction we create new features based on transformation or combination of the original feature set December 2, 2005

8
**Optimal Feature Subset for Classification**

To find optimal feature subset we have to evaluate objective function for subsets Exponential complexity December 2, 2005

9
**Deployment of Feature Selection Methods**

Based on their relation to the induction algorithm feature selection methods can be grouped as: Embedded: They are a part of induction algorithms Filter: They are separate processes from the induction algorithms Wrapper: They are also separate processes from induction algorithm but they use induction algorithm as a subroutine December 2, 2005

10
**Deployment of Feature Selection Methods**

December 2, 2005

11
**Feature Selection Methods**

Based on the optimal solution of the problem, we can divide feature selection methods as: Optimal Selection Methods Suboptimal Selection Methods December 2, 2005

12
**Feature Selection Methods**

December 2, 2005

13
**Optimal Selection Methods**

Exhaustive Search Branch and Bound Search December 2, 2005

14
Exhaustive Search Evaluate all possible subsets consisting of m features of total d features i.e., subsets Guaranteed to find optimal subset An exponential problem December 2, 2005

15
**Branch and Bound Search**

Only fraction of all possible feature subsets will be evaluated Guaranteed to find optimal subset Criterion function must satisfy the monotonicity property i.e., December 2, 2005

16
**Suboptimal Selection Methods**

Best individual Feature Sequential Forward Selection (SFS) Sequential Backward Selection (SBS) “Plus l take away r” Selection Sequential Forward Floating Search (SFFS) Sequential Backward Floating Search (SBFS) December 2, 2005

17
**Best individual Feature**

Evaluate all d features individually using an scalar criterion function Select m best features Clearly a sub optimal method Complexity is O(d) December 2, 2005

18
**Sequential Forward Selection (SFS)**

At the beginning select the best feature using a scalar criterion function Add one feature at a time which along with already selected features to maximize the criterion function, J(.) A greedy algorithm, cannot retract Complexity is O(d) December 2, 2005

19
**Sequential Backward Selection (SBS)**

At the beginning select all d features Delete one feature at a time and Select the subset which maximize the criterion function, J(.) Also a greedy algorithm, cannot retract Complexity is O(d) December 2, 2005

20
**“Plus l take away r” Selection**

At first add l features by forward selection, then discard r features by backward selection Need to decide optimal l and r No subset nesting problems Like SFS and SBS December 2, 2005

21
**Sequential Forward Floating Search (SFFS)**

It is a generalized ‘plus l take away r’ algorithm The value of l and r are determined automatically Close to optimal solution Affordable computational cost December 2, 2005

22
**Sequential Backward Floating Search (SBFS)**

It is also a generalized ‘plus l take away r’ algorithm like SFFS The value of l and r are also determined automatically Close to optimal solution as SFFS More efficient than SFFS for m closer to d than to 1 December 2, 2005

23
**Class Separability Measures**

Divergence Scatter Matrices December 2, 2005

24
Divergence As per Bayes rule, given two classes ω1 and ω2 and a feature vector x, we select ω1 if P(ω1|x) > P(ω2|x) Hence ratio has discriminating capability December 2, 2005

25
Divergence For given P(ω1) and P(ω2) same information resides in D12(x) = ln For completely overlapping classes D12(x) = 0 December 2, 2005

26
Divergence Since x takes different values, it is natural to consider mean value over class ω1 D12 = Similarly for ω2 D21 = The sum d12 = D12 +D21 December 2, 2005

27
Scatter Matrices Computation of Divergence is not easy for non Gaussian distribution Within class scatter matrix is defined as Sw = Si is the covariance matrix for class ωi Si = December 2, 2005

28
**Scatter Matrices Between class scatter matrix is defined as Sb =**

Where μ0 = December 2, 2005

29
**Scatter Matrices Total Mixture scatter matrix is defined as**

Sm = E[(x-µ0)(x-μ0)’] Where Sm = Sw + Sb December 2, 2005

30
Scatter Matrices The following criterion functions can be defined among others J1= J2= J3 = December 2, 2005

31
**Scatter Matrices For equally probable two classes problem**

|Sw| is proportional to σ1²+ σ2² |Sb| is proportional to (µ1-µ2)² December 2, 2005

32
**Review of Minimum Redundancy feature selection methods**

Now we will discuss two minimum redundancy feature selection methods given in the two following papers Ding and Peng (2003) Yu and Liu (2004) December 2, 2005

33
**Review of Minimum Redundancy feature selection methods**

In Ding and Peng (2003) Filter method is used Algorithm is SFS The first feature was selected using maxV1, for all genes in the set S December 2, 2005

34
**Review of Minimum Redundancy feature selection methods**

Suppose already selected m features for the set X The additional features will be selected from the set Y = S – X The following two conditions will be optimized simultaneously 1. 2. December 2, 2005

35
**Review of Minimum Redundancy feature selection methods**

Mutual information, I of two variable x and y is defined as Importance of minimum redundancy is highlighted in the paper December 2, 2005

36
**Review of Minimum Redundancy feature selection methods**

In Yu and Liu (2004) Filter method is used Algorithm is: Relevance analysis 1 Order features based on decreasing ISU values Redundancy analysis 2 Initialize Fi with the first feature in the list 3 Find and remove all features for which Fi forms an approximate redundant cover 4 Set Fi as the next remaining feature in the list and repeat step 3 until the end of the list December 2, 2005

37
**Review of Minimum Redundancy feature selection methods**

Combines SFS with elimination The entropy of a variable X is defined as H(X) = - The entropy of X after observing values of another variable Y is defined as H(X|Y) = - The amount by which the entropy of X decreases reflects additional information about X provided by Y, is called Information Gain IG(X|Y) = H(X) – H(X|Y) December 2, 2005

38
**Review of Minimum Redundancy feature selection methods**

Symmetrical uncertainty is defined as SU(X, Y) = Individual C-correlation (ISUi): The correlation between any feature Fi and the class C is called Individual C-correlation, ISUi Combined C-correlation (CSUi): The correlation between any feature Fi and Fj (i ≠ j) and the class C is called combined C-correlation, CSUi_j Approximate redundant cover: For two features Fi and Fj, Fi formed an approximate redundant cover for Fj iff ISUi ≥ ISUj and ISUi ≥ CSUi_j December 2, 2005

39
**Comparison with our Experimental Results**

To investigate the problem of feature selection we implement a filter method We used FDR as criterion function Initial gene selection was based on gene ranking Then Fisher and Loog-Duin Discriminant techniques are applied to transform the feature space Then linear and quadratic classifier are used 10-fold cross validation was applied We used Leukemia, Lung cancer, and Breast cancer data from UCI repository December 2, 2005

40
**Comparison with our Experimental Results**

Dataset #G #S #SG RBF #S #SG FQ LDQ FL LDL Leukemia Lung cancer Breast cancer Table 1. Comparison of gene selection results. RBF = Redundancy Based Filter FQ = Fisher’s Discriminant Quadratic classifier FL = Fisher’s Discriminant Linear classifier LDQ = Loog-Duin’s Discriminant + Quadratic classifier LDL = Loog-Duin’s Discriminant + Linear classifier December 2, 2005

41
**Comparison with our Experimental Results**

From the table we can observed that RBF selected very compact gene sets for all the cases. FQ and FL out perform LDQ and LDL in all 3 datasets. RBF out perform all methods in 1 dataset by big margin. FQ and FL jointly out perform others in 1 dataset also in big margin. RBF, FQ, and FL have comparable result in 1 dataset. December 2, 2005

42
Conclusions We can conclude that minimum redundancy methods select very compact gene sets. It can help to identify and monitor the target disease or function types. December 2, 2005

43
Conclusions From our experience, on average the performance of LDQ is better than FQ because Fisher discrminant analysis is linear in nature. Here we select gene by FDR ranking. Due this performance of FQ and FL may get enhancement. From the result we can also conclude that gene selection by only ranking has some merits. December 2, 2005

44
References 1.Blum, A. and Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97(1-2) 245–271 2. T.M. Cover, “The Best Two Independent Measurements Are Not the Two Best,” IEEE Trans. Systems, Man, and Cybernetics, vol. 4, pp , 1974. 2. Ding, C. and Peng, H. C. (2003). Minimum Redundancy Feature Selection from Microarray Gene Expression Data. Proc. Second 3. EEE Computational Systems Bioinformatics Conf., 4. R. Duda, P. Hart, and D. Stork. Pattern Classification. John Wiley and Sons, Inc., New York, NY, 2nd edition, 2000. 5. K. S. V. Horn and T. Martinez. The Minimum Set Problem. Neural Networks, 7(3):491–494, 1994. December 2, 2005

45
References 6. Duin R. P. W. Jain, A. K. and J. Mao. Statistical Pattern Recognition: A review. IEEE Transaction on Pattern Analysis and Machine Intelligence, 22(1), 2000. 7. M. Loog and P.W. Duin. Linear Dimensionality Reduction via a Heteroscedastic Extension of LDA: The Chernoff Criterion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6):732–739, 2004. 8. S. Theodoridis and K. Koutroumbas. Pattern Recognition. Elsevier Academic Press, second edition, 2003. 9. L. Yu and H. Liu. Redundency Based Feature Selection for Microarray Data. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 737 – 742, 2004. December 2, 2005

46
Q & A Thanking You December 2, 2005

Similar presentations

OK

ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct. 2005.

ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct. 2005.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on vodafone company profile Ppt on object-oriented concepts in c++ Ppt on 10 famous personalities of india Ppt on antimicrobial activity of ginger Ppt on online shopping cart project in php Ppt on rf based 8 channel remote control Free ppt on different types of houses How to make ppt on macbook pro Ppt on art of war 2 Ppt on anti rigging voting system