Learning Kernel Classifiers 1. Introduction 2005. 04. 25 Summarized by In-Hee Lee.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Introduction to Machine Learning BITS C464/BITS F464
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Biointelligence Laboratory, Seoul National University
Salvatore giorgi Ece 8110 machine learning 5/12/2014
Machine learning continued Image source:
Face Recognition Face Recognition Using Eigenfaces K.RAMNATH BITS - PILANI.
An Overview of Machine Learning
Complex Feature Recognition: A Bayesian Approach for Learning to Recognize Objects by Paul A. Viola Presented By: Emrah Ceyhan Divin Proothi Sherwin Shaidee.
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Pattern Recognition and Machine Learning
CS292 Computational Vision and Language Pattern Recognition and Classification.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Chapter 2: Pattern Recognition
Announcements See Chapter 5 of Duda, Hart, and Stork. Tutorial by Burge linked to on web page. “Learning quickly when irrelevant attributes abound,” by.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Part I: Classification and Bayesian Learning
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 5 Data mining : A Closer Look.
Introduction to machine learning
Radial-Basis Function Networks
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Evaluating Performance for Data Mining Techniques
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Summarized by Soo-Jin Kim
: Chapter 1: Introduction 1 Montri Karnjanadecha ac.th/~montri Principles of Pattern Recognition.
This week: overview on pattern recognition (related to machine learning)
Efficient Model Selection for Support Vector Machines
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Machine Learning CSE 681 CH2 - Supervised Learning.
Universit at Dortmund, LS VIII
Lecture 10: 8/6/1435 Machine Learning Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Machine Learning.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
D. M. J. Tax and R. P. W. Duin. Presented by Mihajlo Grbovic Support Vector Data Description.
Biointelligence Laboratory, Seoul National University
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.
Introduction to Pattern Recognition (การรู้จํารูปแบบเบื้องต้น)
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Data Mining and Decision Support
Introduction Machine Learning: Chapter 1. Contents Types of learning Applications of machine learning Disciplines related with machine learning Well-posed.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
Pattern recognition – basic concepts. Sample input attribute, attribute, feature, input variable, independent variable (atribut, rys, příznak, vstupní.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Big data classification using neural network
Who am I? Work in Probabilistic Machine Learning Like to teach 
Data-intensive Computing Algorithms: Classification
Supervised Time Series Pattern Discovery through Local Importance
The Elements of Statistical Learning
Basic machine learning background with Python scikit-learn
Overview of Supervised Learning
network of simple neuron-like computing elements
3.1.1 Introduction to Machine Learning
Concave Minimization for Support Vector Machine Classifiers
Machine Learning – a Probabilistic Perspective
Restructuring Sparse High Dimensional Data for Effective Retrieval
Presentation transcript:

Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee

© 2005, SNU BioIntelligence Lab, 1. Overview Short overview about supervised, unsupervised and reinforcement learning. A taste on kernel classifiers. Discussion on which theoretical questions are of particular, and practical, importance.

© 2005, SNU BioIntelligence Lab, The Learning Problem and (Statistical) Inference The learning problem  Given a sample of limited size, find a concise description of the data.  Depending on the problem statement, supervised learning, unsupervised learning and reinforcement learning are distinguished.

© 2005, SNU BioIntelligence Lab, The Learning Problem and (Statistical) Inference Supervised learning  Given data is a sample of input-output patterns.  A concise description of the data is a function that can produce the output, given the input.  Classification learning, preference learning and function learning: differs in the type of the output.

© 2005, SNU BioIntelligence Lab, The Learning Problem and (Statistical) Inference Supervised learning (Cont’d)  Classification learning  The output space has no structure except whether two elements of the output space (classes) are equal or not.  The classification of images to the classes “image of the digit x ”  Preference learning  The output space is an order space: whether two elements of the output space (ranks) are equal or which one is to be preferred.  The problem of learning to arrange Web pages such that the most relevant pages are ranked highest.  Function learning  The output space is a metric space such as real numbers  It is possible to use gradient decent techniques whenever the function is differentiable.

© 2005, SNU BioIntelligence Lab, Mapping f Input Output

© 2005, SNU BioIntelligence Lab, The Learning Problem and (Statistical) Inference Unsupervised learning  Given only a sample of objects without associated target values.  A concise description of the data could be a set of clusters or a probability density.  Given a training sample of objects, extract some structure from them. - “ If some structure exists in the training objects, it is possible to take advantage of this redundancy and find a short description of the data. ”  Clustering algorithms

© 2005, SNU BioIntelligence Lab, The Learning Problem and (Statistical) Inference Unsupervised learning (Cont’d)  Clustering algorithms  Given a fixed number of clusters, find a grouping of the objects such that similar objects belong to the same cluster.  Related to the mixture models.

© 2005, SNU BioIntelligence Lab,

1.1 The Learning Problem and (Statistical) Inference Reinforcement learning  Considers the scenario of a dynamic environment that results in state-action-reward triples as the data.  The problem of learning to play chess.  The concise description of the data is a strategy that maximizes the reward over time.  The learner is not told which actions to take in a given state.  Trade-off between exploitation and exploration.

© 2005, SNU BioIntelligence Lab, Dynamic Environment Strategy to Maximizing Rewards

© 2005, SNU BioIntelligence Lab, Learning Kernel Classifiers Typical classification problem.  Problem specification: design a system that can learn to recognize handwritten zip codes on mail envelops.  Representation: concatenation of the rows of the image matrix of intensity values.  Learning algorithm: nearest-neighbor classifier  To classify a new test image, assign it to the class of the training image closest to it.  Has almost optimal performance in the limit of a large number of training images.

© 2005, SNU BioIntelligence Lab, Learning Kernel Classifiers Major problems of nearest-neighbor classifier 1.Requires a distance measure which must be small between images for the same digit and large between images showing different digits.  Euclidean distance: not all of the closest images seem to be related to the correct class.  Needs a better representation. 2.Requires storage of the whole training sample and the computation of distance to all the training samples for each classification of a new image.  Computational problem as soon as the dataset gets larger than a few hundred examples.

© 2005, SNU BioIntelligence Lab, Learning Kernel Classifiers Kernel classifier  Addressing the second problem  Linear classifier functions for each class. Simple and quickly computable. Assigning similar values for images from one class is guaranteed.

© 2005, SNU BioIntelligence Lab, Learning Kernel Classifiers Kernel classifier  Addressing the first problem  Generalized notion of a distance measure For parameter vectors, the classifier function becomes a linear combination of inner product functions in feature space. The linear function involving a kernel is known as kernel classifier. Feature mapping Kernel

© 2005, SNU BioIntelligence Lab, The Purposes of Learning Theory 1. How many training examples do we need to ensure a certain performance? 2. Given a fixed training sample, what performance of the function learned can be guaranteed? 3. Given two different learning algorithms, which one should we choose for a given training example? Answered by having the generalization error bound.  Generalization error: how much we are mislead in choosing the optimal function when generalizing from a given training example to a general prediction function.

© 2005, SNU BioIntelligence Lab, The Purposes of Learning Theory 1. Since generalization error is dependent on the size of the training sample, we fix error and solve for the training sample size. 2. Generalization error bound itself. 3. Choose the algorithm which has the smaller generalization error bound.