Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applications of Supervised Learning in Bioinformatics Yen-Jen Oyang Dept. of Computer Science and Information Engineering.

Similar presentations


Presentation on theme: "Applications of Supervised Learning in Bioinformatics Yen-Jen Oyang Dept. of Computer Science and Information Engineering."— Presentation transcript:

1 Applications of Supervised Learning in Bioinformatics Yen-Jen Oyang Dept. of Computer Science and Information Engineering

2 Problem Definition of Supervised Learning (or Data Classification)   In a supervised learning problem, each sample is described by a set of feature values and each sample belongs to one of the predefined classes.   The goal is to derive a set of rules that predicts which class an incoming query sample should belong to, based on a given set of training samples. Supervised learning is also called data classification.

3 The Vector Space Model feature 1 feature 2 ‧‧‧‧‧ feature m sample 1 sample 2 sample n Class 2 Class 1 Class C

4  In microarray data analysis, supervised learning algorithms have been employed to predict the class of an incoming query sample based on the existing samples with known classes. Application of Supervised Learning in Microarray Data Analysis

5  For example, in the Leukemia data set, there are 72 samples and 7129 genes. 25 Acute Myeloid Leukemia(AML) samples. 25 Acute Myeloid Leukemia(AML) samples. 38 B-cell Acute Lymphoblastic Leukemia (B-cell ALL) samples. 38 B-cell Acute Lymphoblastic Leukemia (B-cell ALL) samples. 9 T-cell Acute Lymphoblastic Leukemia (T- cell ALL) samples. 9 T-cell Acute Lymphoblastic Leukemia (T- cell ALL) samples. Application of Supervised Learning in Microarray Data Analysis

6 Model of the Leukemia Dataset gene 1 gene 2 ‧‧‧‧‧‧‧‧ gene 7129 sample 1 sample 2 sample 72 Class 2 Class 1 Class 3

7 Training Process   From the mathematical point of view, the task of the supervised learning algorithm in the training stage is to identify curves that separate samples with different classes.   Prediction of the class of an incoming query sample is carried out by referring to the separating curves identified during the training stage.

8 query

9 The Basis of Kernel Regression

10  Given a set of samples randomly taken from a probability distribution. We want to find a set of Gaussian functions and the corresponding weights to obtain an approximate probability density function, i.e. Problem Definition of Kernel Density Estimation (KDE) with Gaussian Kernels

11  The KDE based learning algorithm constructs one approximate probability density function for each class of samples.  Prediction is conducted based on the following likelihood function: The KDE Based Predictor

12 The Decision Function of the RVKDE Based Predictor

13   With the KDE based predictor, each training sample is associated with a kernel function, typically with a varying width.

14 An Example of Supervised Learning (Data Classification)   Given the data set shown on next slide, can we figure out a set of rules that predict the classes of samples?

15 Data Set DataClassDataClassDataClass ( 15,33 ) O ( 18,28 ) × ( 16,31 ) O ( 9,23 ) × ( 15,35 ) O ( 9,32 ) × ( 8,15 ) × ( 17,34 ) O ( 11,38 ) × ( 11,31 ) O ( 18,39 ) × ( 13,34 ) O ( 13,37 ) × ( 14,32 ) O ( 19,36 ) × ( 18,32 ) O ( 25,18 ) × ( 10,34 ) × ( 16,38 ) × ( 23,33 ) × ( 15,30 ) O ( 12,33 ) O ( 21,28 ) × ( 13,22 ) ×

16 Distribution of the Data Set 。 。 101520 30 。 。 。 。 。 。 。。 × × × × × × × × × × × × × ×

17 Rule Based on Observation

18 Rule Generated by a Kernel Density Estimation Based Algorithm Let and If then prediction=“O”. Otherwise prediction=“X”.

19 (15,33)(11,31)(18,32)(12,33)(15,35)(17,34)(14,32)(16,31)(13,34)(15,30) 1.7232.7452.3271.7941.9732.0451.794 2.027 (9,23)(8,15)(13,37)(16,38)(18,28)(18,39)(25,18)(23,33)(21,28)(9,32)(11,38)(19,36)(10,34)(13,22) 6.45810.082.9392.7455.4513.28710.865.3225.0704.5623.4633.5873.2326.260


Download ppt "Applications of Supervised Learning in Bioinformatics Yen-Jen Oyang Dept. of Computer Science and Information Engineering."

Similar presentations


Ads by Google