# CSCE555 Bioinformatics Lecture 15 classification for microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

## Presentation on theme: "CSCE555 Bioinformatics Lecture 15 classification for microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:"— Presentation transcript:

CSCE555 Bioinformatics Lecture 15 classification for microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: http://www.scigen.org/csce555 University of South Carolina Department of Computer Science and Engineering 2008 www.cse.sc.edu.www.cse.sc.edu

Outline Classification problem in microarray data Classification concepts and algorithms Evaluation of classification algorithms Summary 4/16/20152

Lab 2.33 ? Bad prognosis recurrence < 5yrs Good Prognosis recurrence > 5yrs Reference L van’t Veer et al (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature, Jan.. Objects Array Feature vectors Gene expression Predefine classes Clinical outcome new array Learning set Classification rule Good Prognosis Matesis > 5

Lab 2.34 B-ALL T-ALLAML Reference Golub et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439): 531-537. Objects Array Feature vectors Gene expression Predefine classes Tumor type ? new array Learning set Classification Rule T-ALL

Classification/Discrimination Each object (e.g. arrays or columns)associated with a class label (or response) Y  {1, 2, …, K} and a feature vector (vector of predictor variables) of G measurements: X = (X 1, …, X G ) Aim: predict Y_new from X_new. sample1sample2sample3sample4sample5 … New sample 1 0.46 0.30 0.80 1.51 0.90... 0.34 2-0.10 0.49 0.24 0.06 0.46... 0.43 3 0.15 0.74 0.04 0.10 0.20... -0.23 4-0.45-1.03-0.79-0.56-0.32... -0.91 5-0.06 1.06 1.35 1.09-1.09... 1.23 Y Normal Normal Normal Cancer Cancer unknown =Y_new X X_new

Discrimination/Classification Lab 2.36

7 Predefined Class {1,2,…K} 12 K Objects Basic principles of discrimination Each object associated with a class label (or response) Y  {1, 2, …, K} and a feature vector (vector of predictor variables) of G measurements: X = (X 1, …, X G ) Aim: predict Y from X. X = {red, square} Y = ? Y = Class Label = 2 X = Feature vector {colour, shape} Classification rule ?

Lab 2.38 KNN: Nearest neighbor classifier Based on a measure of distance between observations (e.g. Euclidean distance or one minus correlation). k-nearest neighbor rule (Fix and Hodges (1951)) classifies an observation X as follows: ◦ find the k observations in the learning set closest to X ◦ predict the class of X by majority vote, i.e., choose the class that is most common among those k observations. The number of neighbors k can be chosen by cross-validation (more on this later).

9 3-Nearest Neighbors query point q f 3 nearest neighbors 2x,1o

Limitation of KNN: what is K?

SVM: Support Vector Machines SVMs are currently among the best performers for a number of classification tasks ranging from text to genomic data. In order to discriminate between two classes, given a training dataset ◦ Map the data to a higher dimension space (feature space) ◦ Separate the two classes using an optimal linear separator 11

12 Key Ideas of SVM: Margins of Linear Separators Maximum margin linear classifier

13 Optimal hyperplane ρ Support vector margin Optimal hyper-plane Support vectors uniquely characterize optimal hyper-plane

Finding the Support Vectors Lagrangian multiplier method for constrained opt Inner product of vectors

15 Key Ideas of SVM: Feature Space Mapping Map the original data to some higher-dimensional feature space where the training set is linearly separable: Φ: x → φ(x) (x1,x2)(x1,x2, x1^2, x2^2, x1*x2, …)

The “Kernel Trick” The linear classifier relies on inner product between vectors K(x i,x j )=x i T x j If every datapoint is mapped into high-dimensional space via some transformation Φ : x → φ (x), the inner product becomes: K(x i,x j )= φ (x i ) T φ (x j ) A kernel function is some function that corresponds to an inner product in some expanded feature space. Example: 2-dimensional vectors x=[x 1 x 2 ]; let K(x i,x j )=(1 + x i T x j ) 2, Need to show that K(x i,x j )= φ (x i ) T φ (x j ): K(x i,x j )=(1 + x i T x j ) 2, = 1+ x i1 2 x j1 2 + 2 x i1 x j1 x i2 x j2 + x i2 2 x j2 2 + 2x i1 x j1 + 2x i2 x j2 = = [1 x i1 2 √2 x i1 x i2 x i2 2 √2x i1 √2x i2 ] T [1 x j1 2 √2 x j1 x j2 x j2 2 √2x j1 √2x j2 ] = = φ (x i ) T φ (x j ), where φ (x) = [1 x 1 2 √2 x 1 x 2 x 2 2 √2x 1 √2x 2 ] 16

Examples of Kernel Functions Linear: K(x i,x j )= x i T x j Polynomial of power p: K(x i,x j )= (1+ x i T x j ) p Gaussian (radial-basis function network): Sigmoid: K(x i,x j )= tanh(β 0 x i T x j + β 1 )

18 SVM Advantages: ◦ maximize the margin between two classes in the feature space characterized by a kernel function ◦ are robust with respect to high input dimension Disadvantages: ◦ difficult to incorporate background knowledge ◦ Sensitive to outliers

19 Variable/Feature Selection with SVMs Recursive Feature Elimination ◦ Train a linear SVM ◦ Remove the variables with the lowest weights (those variables affect classification the least), e.g., remove the lowest 50% of variables ◦ Retrain the SVM with remaining variables and repeat until classification is reduced Very successful Other formulations exist where minimizing the number of variables is folded into the optimization problem Similar algorithm exist for non-linear SVMs Some of the best and most efficient variable selection methods

20 Software A list of SVM implementation can be found at http://www.kernel- machines.org/software.html Some implementation (such as LIBSVM) can handle multi-class classification SVMLight, LibSVM are among one of the earliest implementation of SVM Several Matlab toolboxes for SVM are also available

How to Use SVM to Classify Microarray Data Prepare the data format for LibSVM Labels Index of non- zero features value of non- zero features : :... Usage: svm-train [options] training_set_file [model_file] Examples of options: -s 0 -c 10 - t 1 -g 1 -r 1 -d 3 Usage: svm-predict [options] test_file model_file output_file

22 Decision tree classifiers Gene 1 M i1 < -0.67 Gene 2 M i2 > 0.18 0 2 1 yes no 0.18 Advantage: transparent rules, easy to interpret G1 0.1 -0.2 0.3 G2 0.3 0.4 0.4 G3 … … … Class 0 1 0

23 Ensemble classifiers Training Set X 1, X 2, … X 100 Classifier 1 Resample 1 Classifier 2 Resample 2 Classifier 499 Resample 499 Classifier 500 Resample 500 Examples: Bagging Boosting Random Forest Aggregate classifier

24 Aggregating classifiers: Bagging Training Set (arrays) X 1, X 2, … X 100 Tree 1 Resample 1 X* 1, X* 2, … X* 100 Lets the tree vote Tree 2 Resample 2 X* 1, X* 2, … X* 100 Tree 499 Resample 499 X* 1, X* 2, … X* 100 Tree 500 Resample 500 X* 1, X* 2, … X* 100 Test sample Class 1 Class 2 Class 1 90% Class 1 10% Class 2

Weka Data Mining Toolbox Weka Package (java) includes: ◦ All previous classifiers ◦ Neural networks ◦ Projection pursuit ◦ Bayesian belief networks ◦ And More 25

26 Feature Selection in Classification What: select a subset of features Why: ◦ Lead to better classification performance by removing variables that are noise with respect to the outcome ◦ May provide useful insights into the biology ◦ Can eventually lead to the diagnostic tests (e.g., “breast cancer chip”)

Classifier Performance assessment Any classification rule needs to be evaluated for its performance on the future samples. It is almost never the case in microarray studies that a large independent population-based collection of samples is available at the time of initial classifier-building phase. One needs to estimate future performance based on what is available: often the same set that is used to build the classifier. Assessing performance of the classifier based on ◦ Cross-validation. ◦ Test set ◦ Independent testing on future dataset 27

Diagram of performance assessment Training set Performance assessment Training Set Independent test set Classifier Resubstitution estimation Test set estimation

Diagram of performance assessment Training set Performance assessment Training Set Independent test set (CV) Learning set (CV) Test set Classifier Resubstitution estimation Test set estimation Cross Validation

Performance assessment V-fold cross-validation (CV) estimation: Cases in learning set randomly divided into V subsets of (nearly) equal size. Build classifiers by leaving one set out; compute test set error rates on the left out set and averaged. ◦ Bias-variance tradeoff: smaller V can give larger bias but smaller variance ◦ Computationally intensive. Leave-one-out cross validation (LOOCV). (Special case for V=n). Works well for stable classifiers (k- NN, LDA, SVM) Lab 2.330 Supplementary slide

Which to use depends mostly on sample size If the sample is large enough, split into test and train groups. If sample is barely adequate for either testing or training, use leave one out In between consider V-fold. This method can give more accurate estimates than leave one out, but reduces the size of training set.

Summary Microarray Classification Task Classifiers: KNN, SVM, Decision Tree, Weka, LibSVM Classifier evaluation, cross-validation

Acknowledgement Terry Speed Jean Yee Hwa Yang Jane Fridlyand

Download ppt "CSCE555 Bioinformatics Lecture 15 classification for microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:"

Similar presentations