Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Kernel Classifiers 1. Introduction 2005. 04. 25 Summarized by In-Hee Lee.

Similar presentations


Presentation on theme: "Learning Kernel Classifiers 1. Introduction 2005. 04. 25 Summarized by In-Hee Lee."— Presentation transcript:

1 Learning Kernel Classifiers 1. Introduction 2005. 04. 25 Summarized by In-Hee Lee

2 © 2005, SNU BioIntelligence Lab, http://bi.snu.ac.kr/ 1. Overview Short overview about supervised, unsupervised and reinforcement learning. A taste on kernel classifiers. Discussion on which theoretical questions are of particular, and practical, importance.

3 © 2005, SNU BioIntelligence Lab, http://bi.snu.ac.kr/ 1.1 The Learning Problem and (Statistical) Inference The learning problem  Given a sample of limited size, find a concise description of the data.  Depending on the problem statement, supervised learning, unsupervised learning and reinforcement learning are distinguished.

4 © 2005, SNU BioIntelligence Lab, http://bi.snu.ac.kr/ 1.1 The Learning Problem and (Statistical) Inference Supervised learning  Given data is a sample of input-output patterns.  A concise description of the data is a function that can produce the output, given the input.  Classification learning, preference learning and function learning: differs in the type of the output.

5 © 2005, SNU BioIntelligence Lab, http://bi.snu.ac.kr/ 1.1 The Learning Problem and (Statistical) Inference Supervised learning (Cont’d)  Classification learning  The output space has no structure except whether two elements of the output space (classes) are equal or not.  The classification of images to the classes “image of the digit x ”  Preference learning  The output space is an order space: whether two elements of the output space (ranks) are equal or which one is to be preferred.  The problem of learning to arrange Web pages such that the most relevant pages are ranked highest.  Function learning  The output space is a metric space such as real numbers  It is possible to use gradient decent techniques whenever the function is differentiable.

6 © 2005, SNU BioIntelligence Lab, http://bi.snu.ac.kr/ Mapping f Input Output

7 © 2005, SNU BioIntelligence Lab, http://bi.snu.ac.kr/ 1.1 The Learning Problem and (Statistical) Inference Unsupervised learning  Given only a sample of objects without associated target values.  A concise description of the data could be a set of clusters or a probability density.  Given a training sample of objects, extract some structure from them. - “ If some structure exists in the training objects, it is possible to take advantage of this redundancy and find a short description of the data. ”  Clustering algorithms

8 © 2005, SNU BioIntelligence Lab, http://bi.snu.ac.kr/ 1.1 The Learning Problem and (Statistical) Inference Unsupervised learning (Cont’d)  Clustering algorithms  Given a fixed number of clusters, find a grouping of the objects such that similar objects belong to the same cluster.  Related to the mixture models.

9 © 2005, SNU BioIntelligence Lab, http://bi.snu.ac.kr/

10 1.1 The Learning Problem and (Statistical) Inference Reinforcement learning  Considers the scenario of a dynamic environment that results in state-action-reward triples as the data.  The problem of learning to play chess.  The concise description of the data is a strategy that maximizes the reward over time.  The learner is not told which actions to take in a given state.  Trade-off between exploitation and exploration.

11 © 2005, SNU BioIntelligence Lab, http://bi.snu.ac.kr/ Dynamic Environment Strategy to Maximizing Rewards

12 © 2005, SNU BioIntelligence Lab, http://bi.snu.ac.kr/ 1.2 Learning Kernel Classifiers Typical classification problem.  Problem specification: design a system that can learn to recognize handwritten zip codes on mail envelops.  Representation: concatenation of the rows of the image matrix of intensity values.  Learning algorithm: nearest-neighbor classifier  To classify a new test image, assign it to the class of the training image closest to it.  Has almost optimal performance in the limit of a large number of training images.

13 © 2005, SNU BioIntelligence Lab, http://bi.snu.ac.kr/ 1.2 Learning Kernel Classifiers Major problems of nearest-neighbor classifier 1.Requires a distance measure which must be small between images for the same digit and large between images showing different digits.  Euclidean distance: not all of the closest images seem to be related to the correct class.  Needs a better representation. 2.Requires storage of the whole training sample and the computation of distance to all the training samples for each classification of a new image.  Computational problem as soon as the dataset gets larger than a few hundred examples.

14 © 2005, SNU BioIntelligence Lab, http://bi.snu.ac.kr/ 1.2 Learning Kernel Classifiers Kernel classifier  Addressing the second problem  Linear classifier functions for each class. Simple and quickly computable. Assigning similar values for images from one class is guaranteed.

15 © 2005, SNU BioIntelligence Lab, http://bi.snu.ac.kr/ 1.2 Learning Kernel Classifiers Kernel classifier  Addressing the first problem  Generalized notion of a distance measure For parameter vectors, the classifier function becomes a linear combination of inner product functions in feature space. The linear function involving a kernel is known as kernel classifier. Feature mapping Kernel

16 © 2005, SNU BioIntelligence Lab, http://bi.snu.ac.kr/ 1.3 The Purposes of Learning Theory 1. How many training examples do we need to ensure a certain performance? 2. Given a fixed training sample, what performance of the function learned can be guaranteed? 3. Given two different learning algorithms, which one should we choose for a given training example? Answered by having the generalization error bound.  Generalization error: how much we are mislead in choosing the optimal function when generalizing from a given training example to a general prediction function.

17 © 2005, SNU BioIntelligence Lab, http://bi.snu.ac.kr/ 1.3 The Purposes of Learning Theory 1. Since generalization error is dependent on the size of the training sample, we fix error and solve for the training sample size. 2. Generalization error bound itself. 3. Choose the algorithm which has the smaller generalization error bound.


Download ppt "Learning Kernel Classifiers 1. Introduction 2005. 04. 25 Summarized by In-Hee Lee."

Similar presentations


Ads by Google